From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C2506C3271E
	for <linux-mm@archiver.kernel.org>; Thu, 11 Jul 2024 07:22:23 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 25D1A6B0092; Thu, 11 Jul 2024 03:22:23 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 20D926B0093; Thu, 11 Jul 2024 03:22:23 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 087596B0095; Thu, 11 Jul 2024 03:22:23 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id D44396B0092
	for <linux-mm@kvack.org>; Thu, 11 Jul 2024 03:22:22 -0400 (EDT)
Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay09.hostedemail.com (Postfix) with ESMTP id 814F480445
	for <linux-mm@kvack.org>; Thu, 11 Jul 2024 07:22:22 +0000 (UTC)
X-FDA: 82326628524.02.CDC9FF7
Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180])
	by imf07.hostedemail.com (Postfix) with ESMTP id BC9E340020
	for <linux-mm@kvack.org>; Thu, 11 Jul 2024 07:22:20 +0000 (UTC)
Authentication-Results: imf07.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=VN5I6HYi;
	spf=pass (imf07.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1720682525;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=q9VfpgVQH2TgtowbWo9BXs/8YGiunMlg1+c6aUzQtkE=;
	b=ksWnFNMHZUhexjL7Iqp2JNqVUtx8YaWB4ke/XrspSJRPhrBPZqV+yBC9Qs5oBt0G3P4M+R
	FVuJP9esm+jvHWJSO233b5ECLLjw5X6Gvsqm2vVNJgi/pAFy5rPL63oqoGLrGhFiDP/N/y
	RzmtwyiUk4wHoZrrGYbm7YI7kLyqCYM=
ARC-Authentication-Results: i=1;
	imf07.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=VN5I6HYi;
	spf=pass (imf07.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720682525; a=rsa-sha256;
	cv=none;
	b=c/feJLd5n2gJBSlJFIygYHUOMeUbw/hOP0Gn4fOv6IDNRm3NyaNN0kSK4qnhZ98L6JSU9Y
	AVANRPZdqQxnZ60vQDkrtHUzQzCtpco9nKfJrnIB3hKhK8OKVAAMUP/mbL8LBMatL+lQCx
	kns4j/ruc/BVRdion7GZmCRWhKeVnQc=
Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-447ed27aea2so3068741cf.2
        for <linux-mm@kvack.org>; Thu, 11 Jul 2024 00:22:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1720682540; x=1721287340; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=q9VfpgVQH2TgtowbWo9BXs/8YGiunMlg1+c6aUzQtkE=;
        b=VN5I6HYiJdLtKDP2gVC0MvW+kWK2vDEoe70B5IT/DUsiR7vfBLERTnojouqqjhupsu
         gL3U8cqiPNge/Dk+HeRfzKTUME+Qgi/vOcfp1tZV2QfXWV58UFD3OqNRRNhZDtLKRJJZ
         8EjDH67HsU6cmMQbxioGdKEJAnD7RmWmLROZyDY+nYyTEj55HwlfHH8RI2oKnk3lmukV
         17SaDlv1YeQvbM0/qeqEe9KC+2mfw96UbqiQwmZUmaehVBvsG5IjnLn/i5teZRlTcrrm
         7FYhZ0tX11Nf7xjtQnHcL10hRj6n4GmeeBTrvbdLunWkibPw3L8msHj4T1eTrjB3a6T8
         TmlA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1720682540; x=1721287340;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=q9VfpgVQH2TgtowbWo9BXs/8YGiunMlg1+c6aUzQtkE=;
        b=C5+QxmrQjGaQg4xuRWDr6U7gX3U9XmDdLfqueUkb6iiObUVaWCmqkHBWgh6rQloohf
         NVfBZixFdejUjZxVSzSnloNryC84vtDQpWTciEhxUIfYWdHNVwIuhe3YmqgMIRX66XHX
         /GaUl2p1sLocBJl2I7rbEpwxb6i6N8WUHfGKMA0b5tkxmH4le+g/uLbh03kENnUMqrdg
         VbefXZgztOPxNSaBDGBYDSUEzuvzsbr8z380+QHXYZPBNnJ0lX2sRNVRKy6Q/PJGQn/f
         E2E1ZUGoswtVrFEoPBPzOLWLaj31aDITfAnRmLQOqp8wVPxvW/1Ol0bZadMpkbMMiB3J
         r9ng==
X-Forwarded-Encrypted: i=1; AJvYcCXoLCCZ5IGX7xJZ0k/FBmDcTa89hztxVlHVSBkl1VbwXfDe5RChLixjnESqx8qytZFGb2ihBM+scBegv287tOgnIak=
X-Gm-Message-State: AOJu0YyPlB+yo6CBw4x2oJpngaW08eCU4Ai4yWKk4VUEjQX78EcQX96J
	AbbcFEe1GFoVG1DmmJd4bkeDqBBwREOCadUCAi8nphXQz1ivb4omjf/Ff54ukYMumscEbzzCf86
	v1Cc61GRfTL22drzemcFdMoKuung=
X-Google-Smtp-Source: AGHT+IHH4ZGEGZshLPBSMPGaWH15w8vdVOIx9ieykgRSwjftcNJZ4WWsibbSWkXhr36fcVH6blmKaEBLVeQc9QYhsyU=
X-Received: by 2002:a05:6214:c6e:b0:6b0:4201:3840 with SMTP id
 6a1803df08f44-6b61c1c24bdmr87236506d6.40.1720682539660; Thu, 11 Jul 2024
 00:22:19 -0700 (PDT)
MIME-Version: 1.0
References: <20240707094956.94654-1-laoar.shao@gmail.com> <874j8yar3z.fsf@yhuang6-desk2.ccr.corp.intel.com>
 <CALOAHbDmLx3Ky6h9kFS_p8A6o-mR8Z46Jnr3d=nOEycJX0SqCg@mail.gmail.com> <87sewga0wx.fsf@yhuang6-desk2.ccr.corp.intel.com>
In-Reply-To: <87sewga0wx.fsf@yhuang6-desk2.ccr.corp.intel.com>
From: Yafang Shao <laoar.shao@gmail.com>
Date: Thu, 11 Jul 2024 15:21:43 +0800
Message-ID: <CALOAHbBdQY7C8sttb7T18YrGNLzMAtJKxHAvALs8xxdfPajs4Q@mail.gmail.com>
Subject: Re: [PATCH 0/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max
To: "Huang, Ying" <ying.huang@intel.com>
Cc: akpm@linux-foundation.org, mgorman@techsingularity.net, linux-mm@kvack.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Rspamd-Server: rspam01
X-Rspamd-Queue-Id: BC9E340020
X-Stat-Signature: hhzgjjajznu5chjowwch8jkipmm5ifhu
X-HE-Tag: 1720682540-978919
X-HE-Meta: U2FsdGVkX1+SnHczelPQdswPTZhSoOerIcPCpUQaaz6uCb4RC6L7qeKlL7DKMnKXftz1IfOOy/UXcUoWzjmIZDnGePbouFZ9AEYommeIerz5zO8UuQ0YS11bjNl0bpEAfY0XEM5ETo0bEZRmPHgwGb5pUr8NowQPAnelr5aEcOpfo9pmfT1ETPg/+jX0ms2eW1FB4qqTWvbe1rvl5sz6TfI9516DqUcNDiBl3CHWe0Y5pIH+ni1d74hrLIM51oZLRbHOw78baY1qhONRKsgoRNDJiE0ZqO+XMeTp6Om1DeHwL+wRtTUOH71Q2Q7RoiYyZqg3tJ1hTikfnwhzdFW/aT3C0D9yswPagy7QUFT456I0Gal4yV/zSjdxLQkC79juuArmzU5XuporpvtTV7Fz/xkJItmzO1VJRUpkdsXvdcA5TdyMokwI6GNTx6JILwHMDxa/Xz11HBzcV5Upua8Fch/IG2vapvvVIv5AjGzO4zFMibhpFz/3EmHmmiFA6FiejtOUD1ZOSuMtQuHC4mzWobhPnNr5ECUPOFM2pLywWS4NUPXnY5jkWf75zf1I/QFOsp7br+xrP+W66tQDZeTDVef0anz3iBAGHd8fhpMGasrY6yskqT5oemmIREUxkk9rAG2/3XWAjTzBkdUt2JM9FD4oiixtap+Bxu0BNX/LXFC2Zb8RnauaW/Xunc/cDdA/pLoO4g8ziq1lCrzer0CPpXGUf2LJDu+sRrdnyXVxN9PqBH0DzRqwUTjdclfyNeyHrsMkNkOzxt9kTvNExJrGf7TAhnA5dun22LAcc1hrZOpWYuE3AoM1dVuzdH7FO/40v3AovFf+vEx06P5n73Js7ify2pVA7m4ib5TWf3FNx9BTzmPAnpmE0OfKMnVX6X89sov+mxlw0i1FYiUlfh49/wae5zWdcg5W+NcfpF+NvWa1u6bV0GD2UGeig9WOK1ujf5j/OfjpLFuc0nSe29H
 8R1+c+s4
 Bd4toCioYlD3BA0VF3rwTqIrFV3cHaFnAxlM0Lv7y7xMc6JR/pqRcuue0e2Ezo9ygnXHAjFYkv7rW163G6S+BKxbrQ44zPLjdSMUFn0xQ5a9hyZ3fJC2ZPoTtoGXDFH6VAfX28mxzLOhThm8LhWit7btlO2EF9RS714ygkLAgl14aNBjbj470xkySw/MfcIKY0iOLS9AF/H6g9lACR8XCMKiTZR715YDRuo670tvKdw6w4fdsQt7qV4/FVA7m8Bd4KwIlYsNRsh0Rxiy635qyPKoNd/IP0Ifa5b58V+auGguVeXOylAmU4BGgmrfTRQbkalxibZom0u/04I8GMngVKcxYaaBiF49L1g1+jK4N49HDvr52RL/C/KsnLw3GfkcHnWVsEJjGjGnQBz8Wyzb5iEa6PVNg0X7ZN7zK
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Thu, Jul 11, 2024 at 2:40=E2=80=AFPM Huang, Ying <ying.huang@intel.com> =
wrote:
>
> Yafang Shao <laoar.shao@gmail.com> writes:
>
> > On Wed, Jul 10, 2024 at 11:02=E2=80=AFAM Huang, Ying <ying.huang@intel.=
com> wrote:
> >>
> >> Yafang Shao <laoar.shao@gmail.com> writes:
> >>
> >> > Background
> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >> >
> >> > In our containerized environment, we have a specific type of contain=
er
> >> > that runs 18 processes, each consuming approximately 6GB of RSS. The=
se
> >> > processes are organized as separate processes rather than threads du=
e
> >> > to the Python Global Interpreter Lock (GIL) being a bottleneck in a
> >> > multi-threaded setup. Upon the exit of these containers, other
> >> > containers hosted on the same machine experience significant latency
> >> > spikes.
> >> >
> >> > Investigation
> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >> >
> >> > My investigation using perf tracing revealed that the root cause of
> >> > these spikes is the simultaneous execution of exit_mmap() by each of
> >> > the exiting processes. This concurrent access to the zone->lock
> >> > results in contention, which becomes a hotspot and negatively impact=
s
> >> > performance. The perf results clearly indicate this contention as a
> >> > primary contributor to the observed latency issues.
> >> >
> >> > +   77.02%     0.00%  uwsgi    [kernel.kallsyms]                    =
              [k] mmput
> >> > -   76.98%     0.01%  uwsgi    [kernel.kallsyms]                    =
              [k] exit_mmap
> >> >    - 76.97% exit_mmap
> >> >       - 58.58% unmap_vmas
> >> >          - 58.55% unmap_single_vma
> >> >             - unmap_page_range
> >> >                - 58.32% zap_pte_range
> >> >                   - 42.88% tlb_flush_mmu
> >> >                      - 42.76% free_pages_and_swap_cache
> >> >                         - 41.22% release_pages
> >> >                            - 33.29% free_unref_page_list
> >> >                               - 32.37% free_unref_page_commit
> >> >                                  - 31.64% free_pcppages_bulk
> >> >                                     + 28.65% _raw_spin_lock
> >> >                                       1.28% __list_del_entry_valid
> >> >                            + 3.25% folio_lruvec_lock_irqsave
> >> >                            + 0.75% __mem_cgroup_uncharge_list
> >> >                              0.60% __mod_lruvec_state
> >> >                           1.07% free_swap_cache
> >> >                   + 11.69% page_remove_rmap
> >> >                     0.64% __mod_lruvec_page_state
> >> >       - 17.34% remove_vma
> >> >          - 17.25% vm_area_free
> >> >             - 17.23% kmem_cache_free
> >> >                - 17.15% __slab_free
> >> >                   - 14.56% discard_slab
> >> >                        free_slab
> >> >                        __free_slab
> >> >                        __free_pages
> >> >                      - free_unref_page
> >> >                         - 13.50% free_unref_page_commit
> >> >                            - free_pcppages_bulk
> >> >                               + 13.44% _raw_spin_lock
> >>
> >> I don't think your change will reduce zone->lock contention cycles.  S=
o,
> >> I don't find the value of the above data.
> >>
> >> > By enabling the mm_page_pcpu_drain() we can locate the pertinent pag=
e,
> >> > with the majority of them being regular order-0 user pages.
> >> >
> >> >           <...>-1540432 [224] d..3. 618048.023883: mm_page_pcpu_drai=
n: page=3D0000000035a1b0b7 pfn=3D0x11c19c72 order=3D0 migratetyp
> >> > e=3D1
> >> >            <...>-1540432 [224] d..3. 618048.023887: <stack trace>
> >> >  =3D> free_pcppages_bulk
> >> >  =3D> free_unref_page_commit
> >> >  =3D> free_unref_page_list
> >> >  =3D> release_pages
> >> >  =3D> free_pages_and_swap_cache
> >> >  =3D> tlb_flush_mmu
> >> >  =3D> zap_pte_range
> >> >  =3D> unmap_page_range
> >> >  =3D> unmap_single_vma
> >> >  =3D> unmap_vmas
> >> >  =3D> exit_mmap
> >> >  =3D> mmput
> >> >  =3D> do_exit
> >> >  =3D> do_group_exit
> >> >  =3D> get_signal
> >> >  =3D> arch_do_signal_or_restart
> >> >  =3D> exit_to_user_mode_prepare
> >> >  =3D> syscall_exit_to_user_mode
> >> >  =3D> do_syscall_64
> >> >  =3D> entry_SYSCALL_64_after_hwframe
> >> >
> >> > The servers experiencing these issues are equipped with impressive
> >> > hardware specifications, including 256 CPUs and 1TB of memory, all
> >> > within a single NUMA node. The zoneinfo is as follows,
> >> >
> >> > Node 0, zone   Normal
> >> >   pages free     144465775
> >> >         boost    0
> >> >         min      1309270
> >> >         low      1636587
> >> >         high     1963904
> >> >         spanned  564133888
> >> >         present  296747008
> >> >         managed  291974346
> >> >         cma      0
> >> >         protection: (0, 0, 0, 0)
> >> > ...
> >> >   pagesets
> >> >     cpu: 0
> >> >               count: 2217
> >> >               high:  6392
> >> >               batch: 63
> >> >   vm stats threshold: 125
> >> >     cpu: 1
> >> >               count: 4510
> >> >               high:  6392
> >> >               batch: 63
> >> >   vm stats threshold: 125
> >> >     cpu: 2
> >> >               count: 3059
> >> >               high:  6392
> >> >               batch: 63
> >> >
> >> > ...
> >> >
> >> > The pcp high is around 100 times the batch size.
> >> >
> >> > I also traced the latency associated with the free_pcppages_bulk()
> >> > function during the container exit process:
> >> >
> >> >      nsecs               : count     distribution
> >> >          0 -> 1          : 0        |                               =
         |
> >> >          2 -> 3          : 0        |                               =
         |
> >> >          4 -> 7          : 0        |                               =
         |
> >> >          8 -> 15         : 0        |                               =
         |
> >> >         16 -> 31         : 0        |                               =
         |
> >> >         32 -> 63         : 0        |                               =
         |
> >> >         64 -> 127        : 0        |                               =
         |
> >> >        128 -> 255        : 0        |                               =
         |
> >> >        256 -> 511        : 148      |*****************              =
         |
> >> >        512 -> 1023       : 334      |*******************************=
*********|
> >> >       1024 -> 2047       : 33       |***                            =
         |
> >> >       2048 -> 4095       : 5        |                               =
         |
> >> >       4096 -> 8191       : 7        |                               =
         |
> >> >       8192 -> 16383      : 12       |*                              =
         |
> >> >      16384 -> 32767      : 30       |***                            =
         |
> >> >      32768 -> 65535      : 21       |**                             =
         |
> >> >      65536 -> 131071     : 15       |*                              =
         |
> >> >     131072 -> 262143     : 27       |***                            =
         |
> >> >     262144 -> 524287     : 84       |**********                     =
         |
> >> >     524288 -> 1048575    : 203      |************************       =
         |
> >> >    1048576 -> 2097151    : 284      |*******************************=
***      |
> >> >    2097152 -> 4194303    : 327      |*******************************=
******** |
> >> >    4194304 -> 8388607    : 215      |*************************      =
         |
> >> >    8388608 -> 16777215   : 116      |*************                  =
         |
> >> >   16777216 -> 33554431   : 47       |*****                          =
         |
> >> >   33554432 -> 67108863   : 8        |                               =
         |
> >> >   67108864 -> 134217727  : 3        |                               =
         |
> >> >
> >> > The latency can reach tens of milliseconds.
> >> >
> >> > Experimenting
> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >> >
> >> > vm.percpu_pagelist_high_fraction
> >> > --------------------------------
> >> >
> >> > The kernel version currently deployed in our production environment =
is the
> >> > stable 6.1.y, and my initial strategy involves optimizing the
> >>
> >> IMHO, we should focus on upstream activity in the cover letter and pat=
ch
> >> description.  And I don't think that it's necessary to describe the
> >> alternative solution with too much details.
> >>
> >> > vm.percpu_pagelist_high_fraction parameter. By increasing the value =
of
> >> > vm.percpu_pagelist_high_fraction, I aim to diminish the batch size d=
uring
> >> > page draining, which subsequently leads to a substantial reduction i=
n
> >> > latency. After setting the sysctl value to 0x7fffffff, I observed a =
notable
> >> > improvement in latency.
> >> >
> >> >      nsecs               : count     distribution
> >> >          0 -> 1          : 0        |                               =
         |
> >> >          2 -> 3          : 0        |                               =
         |
> >> >          4 -> 7          : 0        |                               =
         |
> >> >          8 -> 15         : 0        |                               =
         |
> >> >         16 -> 31         : 0        |                               =
         |
> >> >         32 -> 63         : 0        |                               =
         |
> >> >         64 -> 127        : 0        |                               =
         |
> >> >        128 -> 255        : 120      |                               =
         |
> >> >        256 -> 511        : 365      |*                              =
         |
> >> >        512 -> 1023       : 201      |                               =
         |
> >> >       1024 -> 2047       : 103      |                               =
         |
> >> >       2048 -> 4095       : 84       |                               =
         |
> >> >       4096 -> 8191       : 87       |                               =
         |
> >> >       8192 -> 16383      : 4777     |**************                 =
         |
> >> >      16384 -> 32767      : 10572    |*******************************=
         |
> >> >      32768 -> 65535      : 13544    |*******************************=
*********|
> >> >      65536 -> 131071     : 12723    |*******************************=
******   |
> >> >     131072 -> 262143     : 8604     |*************************      =
         |
> >> >     262144 -> 524287     : 3659     |**********                     =
         |
> >> >     524288 -> 1048575    : 921      |**                             =
         |
> >> >    1048576 -> 2097151    : 122      |                               =
         |
> >> >    2097152 -> 4194303    : 5        |                               =
         |
> >> >
> >> > However, augmenting vm.percpu_pagelist_high_fraction can also decrea=
se the
> >> > pcp high watermark size to a minimum of four times the batch size. W=
hile
> >> > this could theoretically affect throughput, as highlighted by Ying[0=
], we
> >> > have yet to observe any significant difference in throughput within =
our
> >> > production environment after implementing this change.
> >> >
> >> > Backporting the series "mm: PCP high auto-tuning"
> >> > -------------------------------------------------
> >>
> >> Again, not upstream activity.  We can describe the upstream behavior
> >> directly.
> >
> > Andrew has requested that I provide a more comprehensive analysis of
> > this issue, and in response, I have endeavored to outline all the
> > pertinent details in a thorough and detailed manner.
>
> IMHO, upstream activity can provide comprehensive analysis of the issue
> too.  And, your patch has changed much from the first version.  It's
> better to describe your current version.

After backporting the pcp auto-tuning feature to the 6.1.y branch, the
code is almost the same with the upstream kernel wrt the pcp. I have
thoroughly documented the detailed data showcasing the changes in the
backported version, providing a clear picture of the results. However,
it's crucial to note that I am unable to directly run the upstream
kernel on our production environment due to practical constraints.

>
> >>
> >> > My second endeavor was to backport the series titled
> >> > "mm: PCP high auto-tuning"[1], which comprises nine individual patch=
es,
> >> > into our 6.1.y stable kernel version. Subsequent to its deployment i=
n our
> >> > production environment, I noted a pronounced reduction in latency. T=
he
> >> > observed outcomes are as enumerated below:
> >> >
> >> >      nsecs               : count     distribution
> >> >          0 -> 1          : 0        |                               =
         |
> >> >          2 -> 3          : 0        |                               =
         |
> >> >          4 -> 7          : 0        |                               =
         |
> >> >          8 -> 15         : 0        |                               =
         |
> >> >         16 -> 31         : 0        |                               =
         |
> >> >         32 -> 63         : 0        |                               =
         |
> >> >         64 -> 127        : 0        |                               =
         |
> >> >        128 -> 255        : 0        |                               =
         |
> >> >        256 -> 511        : 0        |                               =
         |
> >> >        512 -> 1023       : 0        |                               =
         |
> >> >       1024 -> 2047       : 2        |                               =
         |
> >> >       2048 -> 4095       : 11       |                               =
         |
> >> >       4096 -> 8191       : 3        |                               =
         |
> >> >       8192 -> 16383      : 1        |                               =
         |
> >> >      16384 -> 32767      : 2        |                               =
         |
> >> >      32768 -> 65535      : 7        |                               =
         |
> >> >      65536 -> 131071     : 198      |*********                      =
         |
> >> >     131072 -> 262143     : 530      |************************       =
         |
> >> >     262144 -> 524287     : 824      |*******************************=
*******  |
> >> >     524288 -> 1048575    : 852      |*******************************=
*********|
> >> >    1048576 -> 2097151    : 714      |*******************************=
**       |
> >> >    2097152 -> 4194303    : 389      |******************             =
         |
> >> >    4194304 -> 8388607    : 143      |******                         =
         |
> >> >    8388608 -> 16777215   : 29       |*                              =
         |
> >> >   16777216 -> 33554431   : 1        |                               =
         |
> >> >
> >> > Compared to the previous data, the maximum latency has been reduced =
to
> >> > less than 30ms.
> >>
> >> People don't care too much about page freeing latency during processes
> >> exiting.  Instead, they care more about the process exiting time, that
> >> is, throughput.  So, it's better to show the page allocation latency
> >> which is affected by the simultaneous processes exiting.
> >
> > I'm confused also. Is this issue really hard to understand ?
>
> IMHO, it's better to prove the issue directly.  If you cannot prove it
> directly, you can try alternative one and describe why.

Not all data can be verified straightforwardly or effortlessly. The
primary focus lies in the zone->lock contention, which necessitates
measuring the latency it incurs. To accomplish this, the
free_pcppages_bulk() function serves as an effective tool for
evaluation. Therefore, I have opted to specifically measure the
latency associated with free_pcppages_bulk().

The rationale behind not measuring allocation latency is due to the
necessity of finding a willing participant to endure potential delays,
a task that proved unsuccessful as no one expressed interest. In
contrast, assessing free_pcppages_bulk()'s latency solely requires
identifying and experimenting with the source causing the delays,
making it a more feasible approach.


--=20
Regards
Yafang