From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2506C3271E for ; Thu, 11 Jul 2024 07:22:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25D1A6B0092; Thu, 11 Jul 2024 03:22:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 20D926B0093; Thu, 11 Jul 2024 03:22:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 087596B0095; Thu, 11 Jul 2024 03:22:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D44396B0092 for ; Thu, 11 Jul 2024 03:22:22 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 814F480445 for ; Thu, 11 Jul 2024 07:22:22 +0000 (UTC) X-FDA: 82326628524.02.CDC9FF7 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf07.hostedemail.com (Postfix) with ESMTP id BC9E340020 for ; Thu, 11 Jul 2024 07:22:20 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VN5I6HYi; spf=pass (imf07.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720682525; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=q9VfpgVQH2TgtowbWo9BXs/8YGiunMlg1+c6aUzQtkE=; b=ksWnFNMHZUhexjL7Iqp2JNqVUtx8YaWB4ke/XrspSJRPhrBPZqV+yBC9Qs5oBt0G3P4M+R FVuJP9esm+jvHWJSO233b5ECLLjw5X6Gvsqm2vVNJgi/pAFy5rPL63oqoGLrGhFiDP/N/y RzmtwyiUk4wHoZrrGYbm7YI7kLyqCYM= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VN5I6HYi; spf=pass (imf07.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720682525; a=rsa-sha256; cv=none; b=c/feJLd5n2gJBSlJFIygYHUOMeUbw/hOP0Gn4fOv6IDNRm3NyaNN0kSK4qnhZ98L6JSU9Y AVANRPZdqQxnZ60vQDkrtHUzQzCtpco9nKfJrnIB3hKhK8OKVAAMUP/mbL8LBMatL+lQCx kns4j/ruc/BVRdion7GZmCRWhKeVnQc= Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-447ed27aea2so3068741cf.2 for ; Thu, 11 Jul 2024 00:22:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720682540; x=1721287340; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=q9VfpgVQH2TgtowbWo9BXs/8YGiunMlg1+c6aUzQtkE=; b=VN5I6HYiJdLtKDP2gVC0MvW+kWK2vDEoe70B5IT/DUsiR7vfBLERTnojouqqjhupsu gL3U8cqiPNge/Dk+HeRfzKTUME+Qgi/vOcfp1tZV2QfXWV58UFD3OqNRRNhZDtLKRJJZ 8EjDH67HsU6cmMQbxioGdKEJAnD7RmWmLROZyDY+nYyTEj55HwlfHH8RI2oKnk3lmukV 17SaDlv1YeQvbM0/qeqEe9KC+2mfw96UbqiQwmZUmaehVBvsG5IjnLn/i5teZRlTcrrm 7FYhZ0tX11Nf7xjtQnHcL10hRj6n4GmeeBTrvbdLunWkibPw3L8msHj4T1eTrjB3a6T8 TmlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720682540; x=1721287340; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q9VfpgVQH2TgtowbWo9BXs/8YGiunMlg1+c6aUzQtkE=; b=C5+QxmrQjGaQg4xuRWDr6U7gX3U9XmDdLfqueUkb6iiObUVaWCmqkHBWgh6rQloohf NVfBZixFdejUjZxVSzSnloNryC84vtDQpWTciEhxUIfYWdHNVwIuhe3YmqgMIRX66XHX /GaUl2p1sLocBJl2I7rbEpwxb6i6N8WUHfGKMA0b5tkxmH4le+g/uLbh03kENnUMqrdg VbefXZgztOPxNSaBDGBYDSUEzuvzsbr8z380+QHXYZPBNnJ0lX2sRNVRKy6Q/PJGQn/f E2E1ZUGoswtVrFEoPBPzOLWLaj31aDITfAnRmLQOqp8wVPxvW/1Ol0bZadMpkbMMiB3J r9ng== X-Forwarded-Encrypted: i=1; AJvYcCXoLCCZ5IGX7xJZ0k/FBmDcTa89hztxVlHVSBkl1VbwXfDe5RChLixjnESqx8qytZFGb2ihBM+scBegv287tOgnIak= X-Gm-Message-State: AOJu0YyPlB+yo6CBw4x2oJpngaW08eCU4Ai4yWKk4VUEjQX78EcQX96J AbbcFEe1GFoVG1DmmJd4bkeDqBBwREOCadUCAi8nphXQz1ivb4omjf/Ff54ukYMumscEbzzCf86 v1Cc61GRfTL22drzemcFdMoKuung= X-Google-Smtp-Source: AGHT+IHH4ZGEGZshLPBSMPGaWH15w8vdVOIx9ieykgRSwjftcNJZ4WWsibbSWkXhr36fcVH6blmKaEBLVeQc9QYhsyU= X-Received: by 2002:a05:6214:c6e:b0:6b0:4201:3840 with SMTP id 6a1803df08f44-6b61c1c24bdmr87236506d6.40.1720682539660; Thu, 11 Jul 2024 00:22:19 -0700 (PDT) MIME-Version: 1.0 References: <20240707094956.94654-1-laoar.shao@gmail.com> <874j8yar3z.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sewga0wx.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87sewga0wx.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yafang Shao Date: Thu, 11 Jul 2024 15:21:43 +0800 Message-ID: Subject: Re: [PATCH 0/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max To: "Huang, Ying" Cc: akpm@linux-foundation.org, mgorman@techsingularity.net, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: BC9E340020 X-Stat-Signature: hhzgjjajznu5chjowwch8jkipmm5ifhu X-HE-Tag: 1720682540-978919 X-HE-Meta: U2FsdGVkX1+SnHczelPQdswPTZhSoOerIcPCpUQaaz6uCb4RC6L7qeKlL7DKMnKXftz1IfOOy/UXcUoWzjmIZDnGePbouFZ9AEYommeIerz5zO8UuQ0YS11bjNl0bpEAfY0XEM5ETo0bEZRmPHgwGb5pUr8NowQPAnelr5aEcOpfo9pmfT1ETPg/+jX0ms2eW1FB4qqTWvbe1rvl5sz6TfI9516DqUcNDiBl3CHWe0Y5pIH+ni1d74hrLIM51oZLRbHOw78baY1qhONRKsgoRNDJiE0ZqO+XMeTp6Om1DeHwL+wRtTUOH71Q2Q7RoiYyZqg3tJ1hTikfnwhzdFW/aT3C0D9yswPagy7QUFT456I0Gal4yV/zSjdxLQkC79juuArmzU5XuporpvtTV7Fz/xkJItmzO1VJRUpkdsXvdcA5TdyMokwI6GNTx6JILwHMDxa/Xz11HBzcV5Upua8Fch/IG2vapvvVIv5AjGzO4zFMibhpFz/3EmHmmiFA6FiejtOUD1ZOSuMtQuHC4mzWobhPnNr5ECUPOFM2pLywWS4NUPXnY5jkWf75zf1I/QFOsp7br+xrP+W66tQDZeTDVef0anz3iBAGHd8fhpMGasrY6yskqT5oemmIREUxkk9rAG2/3XWAjTzBkdUt2JM9FD4oiixtap+Bxu0BNX/LXFC2Zb8RnauaW/Xunc/cDdA/pLoO4g8ziq1lCrzer0CPpXGUf2LJDu+sRrdnyXVxN9PqBH0DzRqwUTjdclfyNeyHrsMkNkOzxt9kTvNExJrGf7TAhnA5dun22LAcc1hrZOpWYuE3AoM1dVuzdH7FO/40v3AovFf+vEx06P5n73Js7ify2pVA7m4ib5TWf3FNx9BTzmPAnpmE0OfKMnVX6X89sov+mxlw0i1FYiUlfh49/wae5zWdcg5W+NcfpF+NvWa1u6bV0GD2UGeig9WOK1ujf5j/OfjpLFuc0nSe29H 8R1+c+s4 Bd4toCioYlD3BA0VF3rwTqIrFV3cHaFnAxlM0Lv7y7xMc6JR/pqRcuue0e2Ezo9ygnXHAjFYkv7rW163G6S+BKxbrQ44zPLjdSMUFn0xQ5a9hyZ3fJC2ZPoTtoGXDFH6VAfX28mxzLOhThm8LhWit7btlO2EF9RS714ygkLAgl14aNBjbj470xkySw/MfcIKY0iOLS9AF/H6g9lACR8XCMKiTZR715YDRuo670tvKdw6w4fdsQt7qV4/FVA7m8Bd4KwIlYsNRsh0Rxiy635qyPKoNd/IP0Ifa5b58V+auGguVeXOylAmU4BGgmrfTRQbkalxibZom0u/04I8GMngVKcxYaaBiF49L1g1+jK4N49HDvr52RL/C/KsnLw3GfkcHnWVsEJjGjGnQBz8Wyzb5iEa6PVNg0X7ZN7zK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jul 11, 2024 at 2:40=E2=80=AFPM Huang, Ying = wrote: > > Yafang Shao writes: > > > On Wed, Jul 10, 2024 at 11:02=E2=80=AFAM Huang, Ying wrote: > >> > >> Yafang Shao writes: > >> > >> > Background > >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> > > >> > In our containerized environment, we have a specific type of contain= er > >> > that runs 18 processes, each consuming approximately 6GB of RSS. The= se > >> > processes are organized as separate processes rather than threads du= e > >> > to the Python Global Interpreter Lock (GIL) being a bottleneck in a > >> > multi-threaded setup. Upon the exit of these containers, other > >> > containers hosted on the same machine experience significant latency > >> > spikes. > >> > > >> > Investigation > >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> > > >> > My investigation using perf tracing revealed that the root cause of > >> > these spikes is the simultaneous execution of exit_mmap() by each of > >> > the exiting processes. This concurrent access to the zone->lock > >> > results in contention, which becomes a hotspot and negatively impact= s > >> > performance. The perf results clearly indicate this contention as a > >> > primary contributor to the observed latency issues. > >> > > >> > + 77.02% 0.00% uwsgi [kernel.kallsyms] = [k] mmput > >> > - 76.98% 0.01% uwsgi [kernel.kallsyms] = [k] exit_mmap > >> > - 76.97% exit_mmap > >> > - 58.58% unmap_vmas > >> > - 58.55% unmap_single_vma > >> > - unmap_page_range > >> > - 58.32% zap_pte_range > >> > - 42.88% tlb_flush_mmu > >> > - 42.76% free_pages_and_swap_cache > >> > - 41.22% release_pages > >> > - 33.29% free_unref_page_list > >> > - 32.37% free_unref_page_commit > >> > - 31.64% free_pcppages_bulk > >> > + 28.65% _raw_spin_lock > >> > 1.28% __list_del_entry_valid > >> > + 3.25% folio_lruvec_lock_irqsave > >> > + 0.75% __mem_cgroup_uncharge_list > >> > 0.60% __mod_lruvec_state > >> > 1.07% free_swap_cache > >> > + 11.69% page_remove_rmap > >> > 0.64% __mod_lruvec_page_state > >> > - 17.34% remove_vma > >> > - 17.25% vm_area_free > >> > - 17.23% kmem_cache_free > >> > - 17.15% __slab_free > >> > - 14.56% discard_slab > >> > free_slab > >> > __free_slab > >> > __free_pages > >> > - free_unref_page > >> > - 13.50% free_unref_page_commit > >> > - free_pcppages_bulk > >> > + 13.44% _raw_spin_lock > >> > >> I don't think your change will reduce zone->lock contention cycles. S= o, > >> I don't find the value of the above data. > >> > >> > By enabling the mm_page_pcpu_drain() we can locate the pertinent pag= e, > >> > with the majority of them being regular order-0 user pages. > >> > > >> > <...>-1540432 [224] d..3. 618048.023883: mm_page_pcpu_drai= n: page=3D0000000035a1b0b7 pfn=3D0x11c19c72 order=3D0 migratetyp > >> > e=3D1 > >> > <...>-1540432 [224] d..3. 618048.023887: > >> > =3D> free_pcppages_bulk > >> > =3D> free_unref_page_commit > >> > =3D> free_unref_page_list > >> > =3D> release_pages > >> > =3D> free_pages_and_swap_cache > >> > =3D> tlb_flush_mmu > >> > =3D> zap_pte_range > >> > =3D> unmap_page_range > >> > =3D> unmap_single_vma > >> > =3D> unmap_vmas > >> > =3D> exit_mmap > >> > =3D> mmput > >> > =3D> do_exit > >> > =3D> do_group_exit > >> > =3D> get_signal > >> > =3D> arch_do_signal_or_restart > >> > =3D> exit_to_user_mode_prepare > >> > =3D> syscall_exit_to_user_mode > >> > =3D> do_syscall_64 > >> > =3D> entry_SYSCALL_64_after_hwframe > >> > > >> > The servers experiencing these issues are equipped with impressive > >> > hardware specifications, including 256 CPUs and 1TB of memory, all > >> > within a single NUMA node. The zoneinfo is as follows, > >> > > >> > Node 0, zone Normal > >> > pages free 144465775 > >> > boost 0 > >> > min 1309270 > >> > low 1636587 > >> > high 1963904 > >> > spanned 564133888 > >> > present 296747008 > >> > managed 291974346 > >> > cma 0 > >> > protection: (0, 0, 0, 0) > >> > ... > >> > pagesets > >> > cpu: 0 > >> > count: 2217 > >> > high: 6392 > >> > batch: 63 > >> > vm stats threshold: 125 > >> > cpu: 1 > >> > count: 4510 > >> > high: 6392 > >> > batch: 63 > >> > vm stats threshold: 125 > >> > cpu: 2 > >> > count: 3059 > >> > high: 6392 > >> > batch: 63 > >> > > >> > ... > >> > > >> > The pcp high is around 100 times the batch size. > >> > > >> > I also traced the latency associated with the free_pcppages_bulk() > >> > function during the container exit process: > >> > > >> > nsecs : count distribution > >> > 0 -> 1 : 0 | = | > >> > 2 -> 3 : 0 | = | > >> > 4 -> 7 : 0 | = | > >> > 8 -> 15 : 0 | = | > >> > 16 -> 31 : 0 | = | > >> > 32 -> 63 : 0 | = | > >> > 64 -> 127 : 0 | = | > >> > 128 -> 255 : 0 | = | > >> > 256 -> 511 : 148 |***************** = | > >> > 512 -> 1023 : 334 |*******************************= *********| > >> > 1024 -> 2047 : 33 |*** = | > >> > 2048 -> 4095 : 5 | = | > >> > 4096 -> 8191 : 7 | = | > >> > 8192 -> 16383 : 12 |* = | > >> > 16384 -> 32767 : 30 |*** = | > >> > 32768 -> 65535 : 21 |** = | > >> > 65536 -> 131071 : 15 |* = | > >> > 131072 -> 262143 : 27 |*** = | > >> > 262144 -> 524287 : 84 |********** = | > >> > 524288 -> 1048575 : 203 |************************ = | > >> > 1048576 -> 2097151 : 284 |*******************************= *** | > >> > 2097152 -> 4194303 : 327 |*******************************= ******** | > >> > 4194304 -> 8388607 : 215 |************************* = | > >> > 8388608 -> 16777215 : 116 |************* = | > >> > 16777216 -> 33554431 : 47 |***** = | > >> > 33554432 -> 67108863 : 8 | = | > >> > 67108864 -> 134217727 : 3 | = | > >> > > >> > The latency can reach tens of milliseconds. > >> > > >> > Experimenting > >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> > > >> > vm.percpu_pagelist_high_fraction > >> > -------------------------------- > >> > > >> > The kernel version currently deployed in our production environment = is the > >> > stable 6.1.y, and my initial strategy involves optimizing the > >> > >> IMHO, we should focus on upstream activity in the cover letter and pat= ch > >> description. And I don't think that it's necessary to describe the > >> alternative solution with too much details. > >> > >> > vm.percpu_pagelist_high_fraction parameter. By increasing the value = of > >> > vm.percpu_pagelist_high_fraction, I aim to diminish the batch size d= uring > >> > page draining, which subsequently leads to a substantial reduction i= n > >> > latency. After setting the sysctl value to 0x7fffffff, I observed a = notable > >> > improvement in latency. > >> > > >> > nsecs : count distribution > >> > 0 -> 1 : 0 | = | > >> > 2 -> 3 : 0 | = | > >> > 4 -> 7 : 0 | = | > >> > 8 -> 15 : 0 | = | > >> > 16 -> 31 : 0 | = | > >> > 32 -> 63 : 0 | = | > >> > 64 -> 127 : 0 | = | > >> > 128 -> 255 : 120 | = | > >> > 256 -> 511 : 365 |* = | > >> > 512 -> 1023 : 201 | = | > >> > 1024 -> 2047 : 103 | = | > >> > 2048 -> 4095 : 84 | = | > >> > 4096 -> 8191 : 87 | = | > >> > 8192 -> 16383 : 4777 |************** = | > >> > 16384 -> 32767 : 10572 |*******************************= | > >> > 32768 -> 65535 : 13544 |*******************************= *********| > >> > 65536 -> 131071 : 12723 |*******************************= ****** | > >> > 131072 -> 262143 : 8604 |************************* = | > >> > 262144 -> 524287 : 3659 |********** = | > >> > 524288 -> 1048575 : 921 |** = | > >> > 1048576 -> 2097151 : 122 | = | > >> > 2097152 -> 4194303 : 5 | = | > >> > > >> > However, augmenting vm.percpu_pagelist_high_fraction can also decrea= se the > >> > pcp high watermark size to a minimum of four times the batch size. W= hile > >> > this could theoretically affect throughput, as highlighted by Ying[0= ], we > >> > have yet to observe any significant difference in throughput within = our > >> > production environment after implementing this change. > >> > > >> > Backporting the series "mm: PCP high auto-tuning" > >> > ------------------------------------------------- > >> > >> Again, not upstream activity. We can describe the upstream behavior > >> directly. > > > > Andrew has requested that I provide a more comprehensive analysis of > > this issue, and in response, I have endeavored to outline all the > > pertinent details in a thorough and detailed manner. > > IMHO, upstream activity can provide comprehensive analysis of the issue > too. And, your patch has changed much from the first version. It's > better to describe your current version. After backporting the pcp auto-tuning feature to the 6.1.y branch, the code is almost the same with the upstream kernel wrt the pcp. I have thoroughly documented the detailed data showcasing the changes in the backported version, providing a clear picture of the results. However, it's crucial to note that I am unable to directly run the upstream kernel on our production environment due to practical constraints. > > >> > >> > My second endeavor was to backport the series titled > >> > "mm: PCP high auto-tuning"[1], which comprises nine individual patch= es, > >> > into our 6.1.y stable kernel version. Subsequent to its deployment i= n our > >> > production environment, I noted a pronounced reduction in latency. T= he > >> > observed outcomes are as enumerated below: > >> > > >> > nsecs : count distribution > >> > 0 -> 1 : 0 | = | > >> > 2 -> 3 : 0 | = | > >> > 4 -> 7 : 0 | = | > >> > 8 -> 15 : 0 | = | > >> > 16 -> 31 : 0 | = | > >> > 32 -> 63 : 0 | = | > >> > 64 -> 127 : 0 | = | > >> > 128 -> 255 : 0 | = | > >> > 256 -> 511 : 0 | = | > >> > 512 -> 1023 : 0 | = | > >> > 1024 -> 2047 : 2 | = | > >> > 2048 -> 4095 : 11 | = | > >> > 4096 -> 8191 : 3 | = | > >> > 8192 -> 16383 : 1 | = | > >> > 16384 -> 32767 : 2 | = | > >> > 32768 -> 65535 : 7 | = | > >> > 65536 -> 131071 : 198 |********* = | > >> > 131072 -> 262143 : 530 |************************ = | > >> > 262144 -> 524287 : 824 |*******************************= ******* | > >> > 524288 -> 1048575 : 852 |*******************************= *********| > >> > 1048576 -> 2097151 : 714 |*******************************= ** | > >> > 2097152 -> 4194303 : 389 |****************** = | > >> > 4194304 -> 8388607 : 143 |****** = | > >> > 8388608 -> 16777215 : 29 |* = | > >> > 16777216 -> 33554431 : 1 | = | > >> > > >> > Compared to the previous data, the maximum latency has been reduced = to > >> > less than 30ms. > >> > >> People don't care too much about page freeing latency during processes > >> exiting. Instead, they care more about the process exiting time, that > >> is, throughput. So, it's better to show the page allocation latency > >> which is affected by the simultaneous processes exiting. > > > > I'm confused also. Is this issue really hard to understand ? > > IMHO, it's better to prove the issue directly. If you cannot prove it > directly, you can try alternative one and describe why. Not all data can be verified straightforwardly or effortlessly. The primary focus lies in the zone->lock contention, which necessitates measuring the latency it incurs. To accomplish this, the free_pcppages_bulk() function serves as an effective tool for evaluation. Therefore, I have opted to specifically measure the latency associated with free_pcppages_bulk(). The rationale behind not measuring allocation latency is due to the necessity of finding a willing participant to endure potential delays, a task that proved unsuccessful as no one expressed interest. In contrast, assessing free_pcppages_bulk()'s latency solely requires identifying and experimenting with the source causing the delays, making it a more feasible approach. --=20 Regards Yafang