From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6597ECDB483 for ; Thu, 19 Oct 2023 12:13:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DDD3380094; Thu, 19 Oct 2023 08:13:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D8D9E8D019E; Thu, 19 Oct 2023 08:13:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C559580094; Thu, 19 Oct 2023 08:13:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B59BE8D019E for ; Thu, 19 Oct 2023 08:13:09 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8F46DB6534 for ; Thu, 19 Oct 2023 12:13:09 +0000 (UTC) X-FDA: 81362100498.02.34577E0 Received: from outbound-smtp24.blacknight.com (outbound-smtp24.blacknight.com [81.17.249.192]) by imf17.hostedemail.com (Postfix) with ESMTP id 007C74001D for ; Thu, 19 Oct 2023 12:13:05 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.192 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697717586; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bJ4K/u4PLM7GC+IUYpRGVBX6AaneyJ2KnY/OZ2mnKew=; b=b19GzeHR+lX4u6I8L9CORWD1+yQxYARmAYkYJaruqSg/IkgGf6mfAXYjZU9b50S7DN3P/0 0f5jZ4wmubQX/unEN90K26Xp+0IW/HObBIXr82qgwZGHYYJTx4Z/rzJyuGUSMhlc8jyo64 fYi+poK4OdmX+zmAn/yU3yRRwuaJWQk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697717586; a=rsa-sha256; cv=none; b=vookHLCaX+eJTv1mngCvriWhbcepzH5zKoIfslTqIt0ZGS2LUS+KWLwUoD98Mbql4hZ/Bn /GYrmKcfJiIH7L1CJTOyc7y7axG+1z5ymyFdnI1lxBRCmdTBrbHObaZeGyscdx6Q8CjS6J /18NoHD5I55D+7l4FmUZRKWmZ4bgpkI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.192 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp24.blacknight.com (Postfix) with ESMTPS id 3B9A9C0D92 for ; Thu, 19 Oct 2023 13:13:02 +0100 (IST) Received: (qmail 26744 invoked from network); 19 Oct 2023 12:13:02 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.199.31]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 19 Oct 2023 12:13:01 -0000 Date: Thu, 19 Oct 2023 13:12:58 +0100 From: Mel Gorman To: Huang Ying Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , Christoph Lameter Subject: Re: [PATCH -V3 4/9] mm: restrict the pcp batch scale factor to avoid too long latency Message-ID: <20231019121258.52y5o7aaivyq2ex7@techsingularity.net> References: <20231016053002.756205-1-ying.huang@intel.com> <20231016053002.756205-5-ying.huang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20231016053002.756205-5-ying.huang@intel.com> X-Rspamd-Queue-Id: 007C74001D X-Rspam-User: X-Stat-Signature: e1skgqnjmrhwxi8d8hmhn9u4wbr1r4t1 X-Rspamd-Server: rspam03 X-HE-Tag: 1697717585-755459 X-HE-Meta: U2FsdGVkX18rUaW7x4Mw/4MCWZuT3h0UV5YpTRO1FczddsQ1zs0uxzxphDKDC70oIK6ALDEqdbS34AKQz75yWBfpVqQkQgqAzIerxrE6qk078U8AmyZu0uRy3yKWn/j6JKXW1GHHcChAiVgyrVNUBIAegoQLjTLOXVhWjsQawPXpajJyNafj0wyNu2AEpI2op3JY3J5vxrouJXHhRZ7IM3XVcVDS/D2dRJtulnJosmXCHJA8Qv+PVmtVCDDhUphAxdBtl/wNG3y7TT3hq246U527+OWFt4vKxXtHmAeIa/aeKna2RHTyfSvEcqiZ2CheohkthWvAB2Vnc4Phl/xYRNuAtL1ZoUIBXINGwR8g8ccIPn2FWw70qeg+TMDAupGNqHCkg1GjiO617BDhj9pHtx/vDuqnXHjfa3BLZn5+3iBx92F5VtmRH+REfBJAVrqJsd2N3y/SryJmcCYf3O4ELHQXhHBQDst8kqcEaj4Zofc29rk/1EZjMlPqM1QBeNVMA9djRHlZbYqHG93re1zhPn3FInHCmvdTpXLH8leZ3Elxl8sZNrmyuLYerup3TCSohJiC32H+K/ucM+CZREaKlrtQWjhCcsOrpTtyvxwjVNtsI2edxdwCaoEkFze+xyEWWDgRZ8IwQvhRkVG0Y0Boe4rLek5/IKG+QeOVwriXV5OaB+EsbpXjn4PKl3C5E4oQJBP2l4j6BEGwI7E38jdwS2XmVkgRndiO/u9B/iyxBpV+LVjl0ac9j4gSraoSpvC8Nw99bWz12Vnn80Mo+xJAPfE0Y+dfuGgriCPMohFYfFa6raRDojtofP+D04/2CU0/8nbJt4UJLROQMrGrLHUkadhHu1PSMLph/M/z66JLesqz8JNOBk5k5R0oBiGjy85oo0XslZ4TvAFPxjDTK2S/DN9f20TfuhGuFaSDPdJP51xZ5JkUJL3f4dxw7YgeqYEFe7sV2Ou6POrX4FuqPH4 6d71uNJW fsBr052yOY1vwgPYXg3uRtTtjZufWMIe5flEBpNJ51SjqIjcwmw3EGrn0uapYTjibh25i7BorPQFgqBkACvpiUoWUOkZwJxcND0Hst5TFphExDtz1qLGAtkd0FpVZp36E7d10WFDzexbza4p0J3YFONJYmWdqXHHWyhPG2ng0WYNsB1cFoKwIDB063JAdpKEwZgPIXFXTigcZdezGgUou3VLeFlmi265s83I0V1mkgp6r/RfN92QMD6Cuh8zIz7x2q0KjrqD29Gk3jtt1zJcgMdy8732j+YPVMGPNckKWxsZwIVl2zaavwuxIkQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 16, 2023 at 01:29:57PM +0800, Huang Ying wrote: > In page allocator, PCP (Per-CPU Pageset) is refilled and drained in > batches to increase page allocation throughput, reduce page > allocation/freeing latency per page, and reduce zone lock contention. > But too large batch size will cause too long maximal > allocation/freeing latency, which may punish arbitrary users. So the > default batch size is chosen carefully (in zone_batchsize(), the value > is 63 for zone > 1GB) to avoid that. > > In commit 3b12e7e97938 ("mm/page_alloc: scale the number of pages that > are batch freed"), the batch size will be scaled for large number of > page freeing to improve page freeing performance and reduce zone lock > contention. Similar optimization can be used for large number of > pages allocation too. > > To find out a suitable max batch scale factor (that is, max effective > batch size), some tests and measurement on some machines were done as > follows. > > A set of debug patches are implemented as follows, > > - Set PCP high to be 2 * batch to reduce the effect of PCP high > > - Disable free batch size scaling to get the raw performance. > > - The code with zone lock held is extracted from rmqueue_bulk() and > free_pcppages_bulk() to 2 separate functions to make it easy to > measure the function run time with ftrace function_graph tracer. > > - The batch size is hard coded to be 63 (default), 127, 255, 511, > 1023, 2047, 4095. > > Then will-it-scale/page_fault1 is used to generate the page > allocation/freeing workload. The page allocation/freeing throughput > (page/s) is measured via will-it-scale. The page allocation/freeing > average latency (alloc/free latency avg, in us) and allocation/freeing > latency at 99 percentile (alloc/free latency 99%, in us) are measured > with ftrace function_graph tracer. > > The test results are as follows, > > Sapphire Rapids Server > ====================== > Batch throughput free latency free latency alloc latency alloc latency > page/s avg / us 99% / us avg / us 99% / us > ----- ---------- ------------ ------------ ------------- ------------- > 63 513633.4 2.33 3.57 2.67 6.83 > 127 517616.7 4.35 6.65 4.22 13.03 > 255 520822.8 8.29 13.32 7.52 25.24 > 511 524122.0 15.79 23.42 14.02 49.35 > 1023 525980.5 30.25 44.19 25.36 94.88 > 2047 526793.6 59.39 84.50 45.22 140.81 > > Ice Lake Server > =============== > Batch throughput free latency free latency alloc latency alloc latency > page/s avg / us 99% / us avg / us 99% / us > ----- ---------- ------------ ------------ ------------- ------------- > 63 620210.3 2.21 3.68 2.02 4.35 > 127 627003.0 4.09 6.86 3.51 8.28 > 255 630777.5 7.70 13.50 6.17 15.97 > 511 633651.5 14.85 22.62 11.66 31.08 > 1023 637071.1 28.55 42.02 20.81 54.36 > 2047 638089.7 56.54 84.06 39.28 91.68 > > Cascade Lake Server > =================== > Batch throughput free latency free latency alloc latency alloc latency > page/s avg / us 99% / us avg / us 99% / us > ----- ---------- ------------ ------------ ------------- ------------- > 63 404706.7 3.29 5.03 3.53 4.75 > 127 422475.2 6.12 9.09 6.36 8.76 > 255 411522.2 11.68 16.97 10.90 16.39 > 511 428124.1 22.54 31.28 19.86 32.25 > 1023 414718.4 43.39 62.52 40.00 66.33 > 2047 429848.7 86.64 120.34 71.14 106.08 > > Commet Lake Desktop > =================== > Batch throughput free latency free latency alloc latency alloc latency > page/s avg / us 99% / us avg / us 99% / us > ----- ---------- ------------ ------------ ------------- ------------- > > 63 795183.13 2.18 3.55 2.03 3.05 > 127 803067.85 3.91 6.56 3.85 5.52 > 255 812771.10 7.35 10.80 7.14 10.20 > 511 817723.48 14.17 27.54 13.43 30.31 > 1023 818870.19 27.72 40.10 27.89 46.28 > > Coffee Lake Desktop > =================== > Batch throughput free latency free latency alloc latency alloc latency > page/s avg / us 99% / us avg / us 99% / us > ----- ---------- ------------ ------------ ------------- ------------- > 63 510542.8 3.13 4.40 2.48 3.43 > 127 514288.6 5.97 7.89 4.65 6.04 > 255 516889.7 11.86 15.58 8.96 12.55 > 511 519802.4 23.10 28.81 16.95 26.19 > 1023 520802.7 45.30 52.51 33.19 45.95 > 2047 519997.1 90.63 104.00 65.26 81.74 > > From the above data, to restrict the allocation/freeing latency to be > less than 100 us in most times, the max batch scale factor needs to be > less than or equal to 5. > > Although it is reasonable to use 5 as max batch scale factor for the > systems tested, there are also slower systems. Where smaller value > should be used to constrain the page allocation/freeing latency. > > So, in this patch, a new kconfig option (PCP_BATCH_SCALE_MAX) is added > to set the max batch scale factor. Whose default value is 5, and > users can reduce it when necessary. > > Signed-off-by: "Huang, Ying" > Acked-by: Andrew Morton Acked-by: Mel Gorman -- Mel Gorman SUSE Labs