From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 201D7E82CA2 for ; Wed, 27 Sep 2023 14:51:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7FAF08D0081; Wed, 27 Sep 2023 10:51:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7AAAD8D007D; Wed, 27 Sep 2023 10:51:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 673518D0081; Wed, 27 Sep 2023 10:51:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 584638D007D for ; Wed, 27 Sep 2023 10:51:20 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1D44F1A09AD for ; Wed, 27 Sep 2023 14:51:20 +0000 (UTC) X-FDA: 81282665520.25.F2633ED Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by imf19.hostedemail.com (Postfix) with ESMTP id 116231A0010 for ; Wed, 27 Sep 2023 14:51:17 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="1RlsQsA/"; spf=pass (imf19.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.51 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695826278; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fn/fnUImPYJ3juyCBA9k0JjneMWSFn0vkOy/+//4tIM=; b=jfQA2NdoxbF5mGGE6aLNwjtbRRGcoP8p26cqlGVTNSEqtSsiU0/C3KGNaf82i9MBocrd3c 00dSMXZdcy4v6tUtdVdaRQ/K5FvdYxOIRAhpp8CYF0IZVBgSoXTmJ4kO5mM7nNkNuxxg6Q G0pI63v5mddW9nblCTR1aVi3zDhS4r0= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="1RlsQsA/"; spf=pass (imf19.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.51 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695826278; a=rsa-sha256; cv=none; b=Zx52xxAf7ge2nUKskcOskpHT058dmHLi0D9Ud1qFuKr+lVrUXGQc0+4RDIcUxoqf/cqI9K W7LWZ8pQWlyDEXLhYKZBWaZCDYV2pP+5ZfFOp4vTl9Mq3KqfcYhT6QkhkU9WUkitwURadF cS5SBDo3yUV0GPUEWOyLyUP5RCKHKdw= Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-65af726775eso38402006d6.0 for ; Wed, 27 Sep 2023 07:51:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1695826277; x=1696431077; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Fn/fnUImPYJ3juyCBA9k0JjneMWSFn0vkOy/+//4tIM=; b=1RlsQsA/YYbzrrlVQSnT1U8a5lxi6Er0DpN6maZUKlipo5P891C6D5Duefau3J5ipd sV5H/2aHGvpMlz7oYTfUels9ajw0837ffJkVV5jEqew04fk19JkEaiCjltPLIxrmBRaC uH052gPwyovLXsfnDiqC539qlMEINS6/Of/UBMEFRW5Os+JUuqIg9x78CVgYAcrCKGOV nmjxU6kzDhHuKE/Uk+TymTyicqQWja2fpGbHDfyeRvuHefK5y7gO8sERGtQqIlrmYc2x 5kOe4Cd9h+yfBFDs5YLIwiaeNiuGtyIKAoKj+/ooI2UWYo7pma8wkPqM3Wf0NZbiE+4w ZRSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695826277; x=1696431077; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Fn/fnUImPYJ3juyCBA9k0JjneMWSFn0vkOy/+//4tIM=; b=S01JVooTSuMINbMmKRHS1euvRe09+4Y0aZgHfhfrVohFWMXL2CM5OdrUHRT9af4MyM o6ic+CrWid09jc2kbWYlN79z2W6RGYO0ejtzFC0xOqVGGsV5IXpwijOlUeEuXJGgbiB7 2qa4PPrFcroD0jTGZE8nWAGd88yKbRPlnuD7EmqXbB+5RGHDmz73UYDQiWfqldUZOuwq E6uCo03po+1Ui7FpIuxn101Sov13TJ7pQIY6OP6uaEANQUzannuliwy9Bub2CR0pSuaD z/O6AgcMauzQmevIdQtG1DHm3Wbul6J9pGZDybE+E/R3c1TG5sF8ZM36GGFE3g6wDF1r s/rw== X-Gm-Message-State: AOJu0YxwNCKccdqkNEeephVlnkFUvp/HiGhB/b8OjsL1udP5jkfju4hn UqDvqn69wAnIDlKZ+x44Pmg1sQ== X-Google-Smtp-Source: AGHT+IGKGncTxrNNtoHGKBSC6CBHf9ebv6M9I7tnRJOLBc69Ux1UBXseerwEavcZ/fe8F7AUZpA4fA== X-Received: by 2002:a0c:e052:0:b0:655:8ac4:2658 with SMTP id y18-20020a0ce052000000b006558ac42658mr2117297qvk.21.1695826277004; Wed, 27 Sep 2023 07:51:17 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:ba06]) by smtp.gmail.com with ESMTPSA id z24-20020ae9c118000000b007671cfe8a18sm5503943qki.13.2023.09.27.07.51.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 07:51:16 -0700 (PDT) Date: Wed, 27 Sep 2023 10:51:15 -0400 From: Johannes Weiner To: "Huang, Ying" Cc: Andrew Morton , Vlastimil Babka , Mel Gorman , Miaohe Lin , Kefeng Wang , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching Message-ID: <20230927145115.GA365513@cmpxchg.org> References: <20230911195023.247694-1-hannes@cmpxchg.org> <20230911195023.247694-2-hannes@cmpxchg.org> <87y1gsrx32.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87y1gsrx32.fsf@yhuang6-desk2.ccr.corp.intel.com> X-Rspamd-Queue-Id: 116231A0010 X-Rspam-User: X-Stat-Signature: bbg7xbau3n31qz979w8ue938fn96pdjx X-Rspamd-Server: rspam01 X-HE-Tag: 1695826277-471870 X-HE-Meta: U2FsdGVkX1+VYOzduD8hucEcGdX8QE5G/SnVvWOZc3yxHZc+UY7kj+j+blzd4t6NbLjdPIIEuphlbOpci5usZlPpFB/FDlz/MiwdC3f35Cxlg5ZCC8YmD3wenMjimb1VmU6YtNVNAjTXJfCmrzJpj4c99GAzCLCoAWEJ/Nu9nDMC9Xs7nh5LJqxEJp5607sMiVMEcfwObFbXk5pEWWu+wMqRgp+gRUKKDTN5c0FfVk4YOgV5/BrkG7fh27PsetjGJf4bmtDpoHhwG5sn+8HdS5SOgbAip+HOqEHShTtKGFexz7M1GojC6wAcA4P4Y8YmUZ0HynWgw+YR9SK1+fAmmcz9TLUNqkiWbDLa9yd4hRFoRIklIeFSriOyPokSoSCFfecF4xjbHAmxrUf4meBudzXefAsX2jAdB7F5jd4S8QbKLsNciV3WylHsjIMxcJoyNqx5uMUtbdFKU/ZfwHhCayqg1/UjFewMttPDcFJ3HRdZBLh2za/A6JYxsEeL24p4VAOR4GMTamfLJRPCtvjrcZqlWCHsPFC/g63T1LnUPW8AbnVrx0p22GzkNMbcRDsCOwZmsJJYT1YzeFTbOi3avuLN2HgJ/zT1xF+MF4hPhwzbM7d6tVdT6KaQGKCUOkSGoV+rvMkIT5vhAQXes2swUYCXHa0yfQMOVYU2j8c2ODz3Wbx7J23dMD/iSwcejrlvW0y2csPmbk7k86lTqlkg+ixRc0JslCQKvA5YvxG+lF7Uyy2QIZwZFVIWQwjLyc0rXAmJaGgvb0z0xe1+O94c95R7MFqJY1Df8ituC9dqMdD0vW6WMWcHiyI3TTTzPanTZ9zKdeuQgPWuVjNaX/v9QvyrcpV4KtAMb0Oi/OEZsz/o2tjjf4OPzRUMx2H5R7f2NHD0VhFeGvnQOfGy88SvoJQ8/8bnRgv/Q9uSXzVGNij8v0Zx7e5cplt1lsMjGx26ad6AkA+It9xaUVW9smH RzQl3X+n uACD0Hdi1vAo9d+uuNMxhT56G3zcaC4qjpnSBC+shEi9Y/utFBlBq/5Jj6U0+j8/+g6vdH3Js7nrp8hUHsoGUEzY4+TU04ROh6Nb1S9TAwFFQeATmbvTFEmVzKVjGnNV0RTU0YOwQ0VcTtLE7iIiUr522Htv2x+xDFaJLwG0Tv4NRtoDcnk4w2/INf4SYb91ng8NAPjs8C/1qfbhv4FeNsTxycQETV8tzMrgY87uBegFlFl89Q4dYJJcHQ3irrZJT0pms6ieXhoRbCYxLTnn+oAqbJwxgd7zrjXtSxCw0il6wwQbhJ7wGiRH0kcqVGySnyypknFK8dK3XWjt8FkYZCqcQKD0lJ5o/+bRxlMoJ6MQl8NwHmwWoJLaTLp5O16iQ6qXJHiI64qoU6azC7ULBcyH+Gueq1phPBS0GG+IxGJUeInzxFGZ+ny8RKQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 27, 2023 at 01:42:25PM +0800, Huang, Ying wrote: > Johannes Weiner writes: > > > The idea behind the cache is to save get_pageblock_migratetype() > > lookups during bulk freeing. A microbenchmark suggests this isn't > > helping, though. The pcp migratetype can get stale, which means that > > bulk freeing has an extra branch to check if the pageblock was > > isolated while on the pcp. > > > > While the variance overlaps, the cache write and the branch seem to > > make this a net negative. The following test allocates and frees > > batches of 10,000 pages (~3x the pcp high marks to trigger flushing): > > > > Before: > > 8,668.48 msec task-clock # 99.735 CPUs utilized ( +- 2.90% ) > > 19 context-switches # 4.341 /sec ( +- 3.24% ) > > 0 cpu-migrations # 0.000 /sec > > 17,440 page-faults # 3.984 K/sec ( +- 2.90% ) > > 41,758,692,473 cycles # 9.541 GHz ( +- 2.90% ) > > 126,201,294,231 instructions # 5.98 insn per cycle ( +- 2.90% ) > > 25,348,098,335 branches # 5.791 G/sec ( +- 2.90% ) > > 33,436,921 branch-misses # 0.26% of all branches ( +- 2.90% ) > > > > 0.0869148 +- 0.0000302 seconds time elapsed ( +- 0.03% ) > > > > After: > > 8,444.81 msec task-clock # 99.726 CPUs utilized ( +- 2.90% ) > > 22 context-switches # 5.160 /sec ( +- 3.23% ) > > 0 cpu-migrations # 0.000 /sec > > 17,443 page-faults # 4.091 K/sec ( +- 2.90% ) > > 40,616,738,355 cycles # 9.527 GHz ( +- 2.90% ) > > 126,383,351,792 instructions # 6.16 insn per cycle ( +- 2.90% ) > > 25,224,985,153 branches # 5.917 G/sec ( +- 2.90% ) > > 32,236,793 branch-misses # 0.25% of all branches ( +- 2.90% ) > > > > 0.0846799 +- 0.0000412 seconds time elapsed ( +- 0.05% ) > > > > A side effect is that this also ensures that pages whose pageblock > > gets stolen while on the pcplist end up on the right freelist and we > > don't perform potentially type-incompatible buddy merges (or skip > > merges when we shouldn't), whis is likely beneficial to long-term > > fragmentation management, although the effects would be harder to > > measure. Settle for simpler and faster code as justification here. > > I suspected the PCP allocating/freeing path may be influenced (that is, > allocating/freeing batch is less than PCP high). So I tested > one-process will-it-scale/page_fault1 with sysctl > percpu_pagelist_high_fraction=8. So pages will be allocated/freed > from/to PCP only. The test results are as follows, > > Before: > will-it-scale.1.processes 618364.3 (+- 0.075%) > perf-profile.children.get_pfnblock_flags_mask 0.13 (+- 9.350%) > > After: > will-it-scale.1.processes 616512.0 (+- 0.057%) > perf-profile.children.get_pfnblock_flags_mask 0.41 (+- 22.44%) > > The change isn't large: -0.3%. Perf profiling shows the cycles% of > get_pfnblock_flags_mask() increases. Ah, this is going through the free_unref_page_list() path that Vlastimil had pointed out as well. I made another change on top that eliminates the second lookup. After that, both pcp fast paths have the same number of lookups as before: 1. This fixes the regression for me. Would you mind confirming this as well? -- >From f5d032019ed832a1a50454347a33b00ca6abeb30 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 15 Sep 2023 16:03:24 -0400 Subject: [PATCH] mm: page_alloc: optimize free_unref_page_list() Move direct freeing of isolated pages to the lock-breaking block in the second loop. This saves an unnecessary migratetype reassessment. Minor comment and local variable scoping cleanups. Suggested-by: Vlastimil Babka Signed-off-by: Johannes Weiner --- mm/page_alloc.c | 44 ++++++++++++++++++-------------------------- 1 file changed, 18 insertions(+), 26 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bfffc1af94cd..665930ffe22a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2466,48 +2466,40 @@ void free_unref_page_list(struct list_head *list) struct per_cpu_pages *pcp = NULL; struct zone *locked_zone = NULL; int batch_count = 0; - int migratetype; - - /* Prepare pages for freeing */ - list_for_each_entry_safe(page, next, list, lru) { - unsigned long pfn = page_to_pfn(page); - - if (!free_pages_prepare(page, 0, FPI_NONE)) { - list_del(&page->lru); - continue; - } - /* - * Free isolated pages directly to the allocator, see - * comment in free_unref_page. - */ - migratetype = get_pfnblock_migratetype(page, pfn); - if (unlikely(is_migrate_isolate(migratetype))) { + list_for_each_entry_safe(page, next, list, lru) + if (!free_pages_prepare(page, 0, FPI_NONE)) list_del(&page->lru); - free_one_page(page_zone(page), page, pfn, 0, FPI_NONE); - continue; - } - } list_for_each_entry_safe(page, next, list, lru) { unsigned long pfn = page_to_pfn(page); struct zone *zone = page_zone(page); + int migratetype; list_del(&page->lru); migratetype = get_pfnblock_migratetype(page, pfn); /* - * Either different zone requiring a different pcp lock or - * excessive lock hold times when freeing a large list of - * pages. + * Zone switch, batch complete, or non-pcp freeing? + * Drop the pcp lock and evaluate. */ - if (zone != locked_zone || batch_count == SWAP_CLUSTER_MAX) { + if (unlikely(zone != locked_zone || + batch_count == SWAP_CLUSTER_MAX || + is_migrate_isolate(migratetype))) { if (pcp) { pcp_spin_unlock(pcp); pcp_trylock_finish(UP_flags); + locked_zone = NULL; } - batch_count = 0; + /* + * Free isolated pages directly to the + * allocator, see comment in free_unref_page. + */ + if (is_migrate_isolate(migratetype)) { + free_one_page(zone, page, pfn, 0, FPI_NONE); + continue; + } /* * trylock is necessary as pages may be getting freed @@ -2518,10 +2510,10 @@ void free_unref_page_list(struct list_head *list) if (unlikely(!pcp)) { pcp_trylock_finish(UP_flags); free_one_page(zone, page, pfn, 0, FPI_NONE); - locked_zone = NULL; continue; } locked_zone = zone; + batch_count = 0; } /* -- 2.42.0