From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54360CA0EEF for ; Tue, 12 Sep 2023 14:50:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A040E6B00FE; Tue, 12 Sep 2023 10:50:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B4696B00FF; Tue, 12 Sep 2023 10:50:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 854D26B0101; Tue, 12 Sep 2023 10:50:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 70A596B00FE for ; Tue, 12 Sep 2023 10:50:33 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 39188C011D for ; Tue, 12 Sep 2023 14:50:33 +0000 (UTC) X-FDA: 81228231546.18.202BE69 Received: from mail-qk1-f173.google.com (mail-qk1-f173.google.com [209.85.222.173]) by imf18.hostedemail.com (Postfix) with ESMTP id E027D1C001B for ; Tue, 12 Sep 2023 14:50:30 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=SHBYlicA; spf=pass (imf18.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.173 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694530231; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6HSKteuboFmnivzLeuEg+npIPTGUVBOAqjU+EfUkVq4=; b=a8+FhkayB7X3OvSi3pajeCiZeJaPU1zl6TqIzRwccVSA7SDk2xxN+Mlukkc+IjlklCwMDZ wFG8L2Jt4ma1g0G2NvcTwhY8uuGjxa8F9qA39g3TLrtnCP0n4JMkiRL7+M+32s1TkfH5tu X1QK7hn68c2iE7Q2ELV/kzXE0nEv8co= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=SHBYlicA; spf=pass (imf18.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.173 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694530231; a=rsa-sha256; cv=none; b=QrtQypuaTycC+1J3FYS758RoZI8opc4812VGCI52yDKDtHrs//0BQS8LhRWLFH699Yp2Jj L+2Zp/jz+EaBY0t1xmWOLI3WkT3ZyA0gia5EE+TVfXJLr5gZCPJ4r6rnEr7ClmHycIhHFP z70v/i9/PjX22AE7Es+2faAYxqXHO+w= Received: by mail-qk1-f173.google.com with SMTP id af79cd13be357-770ef1f4513so229936985a.1 for ; Tue, 12 Sep 2023 07:50:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1694530230; x=1695135030; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=6HSKteuboFmnivzLeuEg+npIPTGUVBOAqjU+EfUkVq4=; b=SHBYlicA4GvN1xoi43CXUq3GKCAOfKRgEYn7K8yh1/pvpIx6kvoIzkt44D98ZMcogJ vRm4RLqr2Fk0UQjZoyxsN2L/532TuVvS4dJMSd47I30Wf0+akLhFo7O1VOqBYBDE1O6N Gvt8FO9VjSjnp7D0Kj/0vIndRPczDwi/uYXPYAfpZBR6SZEJnFWHnFyVaNcp6YpncFAF LzygRqmvdVXpsYgVQyrSuRrArjA1X64VdPflYlD2HnxWrD0PBid24RU8jNnF+ehMal/3 BDYv80XQxKl1HZ0dGB12GNz+Do82mchOIYpkd3grvsPjh7/4RHA1DbLCmQ6zxGCDE5jV boqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694530230; x=1695135030; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=6HSKteuboFmnivzLeuEg+npIPTGUVBOAqjU+EfUkVq4=; b=pkEjmu8puYp1/xyHonzQ8vIyyJqV0JHyLbf8+rad5pRdYHbWZPOHskRlTk9g06qX99 6hS16zT2TgknB0q3vGudcHIcOmtbGSQtzJnh/sWgctvEhZQryEhEmWjf/VMD8BfVuXZM omDLL7ALby24kVHeRI3qRLiJvLmN75KIYq/MnJGhp23HXJPEkteRyafhWu+Y7Es+8tpi wZMrhLG9S2UJ5h/ZoIDioJKUtmNoqa2XqnfPtgIFCyGBZPbsu9QanLCAPLoX1eHChqx7 Rla7JLBZepiX9bVS6MdAtj7+piPcxGDwkSx9jJ/+tOJL/7pFfVKMwcbfMMhESk0phIBH CngQ== X-Gm-Message-State: AOJu0YyBh0JZUopiKQ07cj9DANAArUCe+9o9TEUKOuqFEXltc0PpolFY b9snzhp9T/qYmR/tAxEJo2M8VA== X-Google-Smtp-Source: AGHT+IHNfDzGkVG1pcumuC7QfDwIG+2738HwDCe7yhp0B+IBbDO+rNaUBgTK5sgaYXfhvgEqrSNjrQ== X-Received: by 2002:a05:620a:f02:b0:76c:b16b:ad74 with SMTP id v2-20020a05620a0f0200b0076cb16bad74mr16689319qkl.19.1694530229831; Tue, 12 Sep 2023 07:50:29 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-3012-16a2-6bc2-2937.res6.spectrum.com. [2603:7000:c01:2716:3012:16a2:6bc2:2937]) by smtp.gmail.com with ESMTPSA id p11-20020ae9f30b000000b0076f0744ff50sm3206928qkg.136.2023.09.12.07.50.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 07:50:29 -0700 (PDT) Date: Tue, 12 Sep 2023 10:50:28 -0400 From: Johannes Weiner To: Vlastimil Babka Cc: Andrew Morton , Mel Gorman , Miaohe Lin , Kefeng Wang , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching Message-ID: <20230912145028.GA3228@cmpxchg.org> References: <20230911195023.247694-1-hannes@cmpxchg.org> <20230911195023.247694-2-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: E027D1C001B X-Rspam-User: X-Stat-Signature: caqrcj38hn641b8xbin5qm4wiibbjp3z X-Rspamd-Server: rspam01 X-HE-Tag: 1694530230-979559 X-HE-Meta: U2FsdGVkX19xObUCvWVg4wSyByqXedvAiR/FWQ0VUKdoK9B+PODl+ZxK/umhAWMf8IopFu2B8rmp0i1MX87kqHp3P0BKilywFQrSwfCDQYIooxiXVQOzUSke3SSHchb69IKavgjhMIYoGJH3QhpC+0xDEp4cAMepLD0Fvtks9QJzvO7Ilmu6L+IG4yLlX5syjI9ejK0KuOHTqT/qvbw0qEaovJ9CWcH3a7Lsh/THZIAIoEPyAYG7bLw1Kz+qPGvf16QewZEQ3MrxfPl09tgHi0S+D6rn5PvaMLvOcVCNHBI/D8FE7NyCa+wULZv4vR2DaaG9vT7Zq/oyoMJ1iQwogLmYEoaxCU5U3R/9u6muNeaRksN1KjswnPQGnkANeLKywhot8QfzrwVmXsCBeKhaNl9aAoEXYsbwoXzLWUE4yOlFyIY3EMnDjjFjo4sLIod65DiRQ7KRpjpi87mCJHSiDxiKlbgtdPTQBUNxcy2xq4W8MYGFw+K+6QgH0ffN7/gE5fD6/tBh2QMD31BbGiD83kpLHP1ISfJ/9fVQ4OmFouqbiDPfjSXY78QWeTjTQmm4S2z0LWbDDnXwTpettJY7INIiJLi+fLwvG4nIuQHqAh0mT8XL8a4A3lQz+kTZDudgaI0CHydqstjCpK0/N7Nlf3O4boSBC+lepjM1P/MAwBEb0zXy6/X3WgAS9jOBzACM3zun15t8OAJeX/0VA28FP7AypiRDu0D5dQNIZxeFr4XpvRAb1Ny0KfO+Xvl/npJ4VsDtHmRhOvV/2jUvESq6QrkPVesIlMxmJ5rR5Q1tc0/O97b4C3aGt967EjQQnuxc55Mp3KO+64DDXulbPdxM5YaT221Ym64MejINxA36YNkdr3xEc17sxOCgSqU2aT1YrV26PJP0FAN92txd6MThtfy6qso3TE1VSvm4XMUYYsQ+brURzn/dS5KPIVeUn+nTdLqGcZwfIV88MJ7LzaY 1n1nAg2P i3K81TeEMFsQeL9HQzRCiMtGVc+kqWTpOdB5LOuSehTAx4kYTWlSruhvDn/DTj9Zqcvi8IDrWewIGNLSaQaddEVorTlp56m8ZQT2Hy8hqjVoDgDZxxDKXjHDYk6UkOvHC+17u6i01LwOMaDoN9KN/MtiejpuAtrfYmqYtDexadQXoleIHYH2PhRVaZT5x/Tg+zk1JdLR1s24o+vSgebMBs6AKaHvUESLdgM5Sc2kWl1/e8xmDkUaLN+aptzk62VCJOmbMbITLnxScVNuxwC7GUrKwi60q4MzM/BNqkOEFTmM0T1fEzFfuZGhShPMIW/95q+Xg/t3BSJFKXIz2wsIxUKBE+XnRakKclfdCho69BOEUuGK3la8OhaFy0c+eezno/Ts4S7Jd9WnnhbGFoLdxJb5pE0jQXSlEjYddaZuk5vKYZF7gSqPTpKgbe0taxU99XZXoNjkFn0ytg3DBJzwy6Ko6q+UxLRY//jgYVUSIO/oW9SXva/4Fh8hEg2CQgEvo2fI0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 12, 2023 at 03:47:45PM +0200, Vlastimil Babka wrote: > On 9/11/23 21:41, Johannes Weiner wrote: > > The idea behind the cache is to save get_pageblock_migratetype() > > lookups during bulk freeing. A microbenchmark suggests this isn't > > helping, though. The pcp migratetype can get stale, which means that > > bulk freeing has an extra branch to check if the pageblock was > > isolated while on the pcp. > > > > While the variance overlaps, the cache write and the branch seem to > > make this a net negative. The following test allocates and frees > > batches of 10,000 pages (~3x the pcp high marks to trigger flushing): > > > > Before: > > 8,668.48 msec task-clock # 99.735 CPUs utilized ( +- 2.90% ) > > 19 context-switches # 4.341 /sec ( +- 3.24% ) > > 0 cpu-migrations # 0.000 /sec > > 17,440 page-faults # 3.984 K/sec ( +- 2.90% ) > > 41,758,692,473 cycles # 9.541 GHz ( +- 2.90% ) > > 126,201,294,231 instructions # 5.98 insn per cycle ( +- 2.90% ) > > 25,348,098,335 branches # 5.791 G/sec ( +- 2.90% ) > > 33,436,921 branch-misses # 0.26% of all branches ( +- 2.90% ) > > > > 0.0869148 +- 0.0000302 seconds time elapsed ( +- 0.03% ) > > > > After: > > 8,444.81 msec task-clock # 99.726 CPUs utilized ( +- 2.90% ) > > 22 context-switches # 5.160 /sec ( +- 3.23% ) > > 0 cpu-migrations # 0.000 /sec > > 17,443 page-faults # 4.091 K/sec ( +- 2.90% ) > > 40,616,738,355 cycles # 9.527 GHz ( +- 2.90% ) > > 126,383,351,792 instructions # 6.16 insn per cycle ( +- 2.90% ) > > 25,224,985,153 branches # 5.917 G/sec ( +- 2.90% ) > > 32,236,793 branch-misses # 0.25% of all branches ( +- 2.90% ) > > > > 0.0846799 +- 0.0000412 seconds time elapsed ( +- 0.05% ) > > > > A side effect is that this also ensures that pages whose pageblock > > gets stolen while on the pcplist end up on the right freelist and we > > don't perform potentially type-incompatible buddy merges (or skip > > merges when we shouldn't), whis is likely beneficial to long-term > > fragmentation management, although the effects would be harder to > > measure. Settle for simpler and faster code as justification here. > > Makes sense to me, so > > > Signed-off-by: Johannes Weiner > > Reviewed-by: Vlastimil Babka Thanks! > > @@ -1577,7 +1556,6 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, > > continue; > > del_page_from_free_list(page, zone, current_order); > > expand(zone, page, order, current_order, migratetype); > > - set_pcppage_migratetype(page, migratetype); > > Hm interesting, just noticed that __rmqueue_fallback() never did this > AFAICS, sounds like a bug. I don't quite follow. Which part? Keep in mind that at this point __rmqueue_fallback() doesn't return a page. It just moves pages to the desired freelist, and then __rmqueue_smallest() gets called again. This changes in 5/6, but until now at least all of the above would apply to fallback pages. > > @@ -2145,7 +2123,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, > > * pages are ordered properly. > > */ > > list_add_tail(&page->pcp_list, list); > > - if (is_migrate_cma(get_pcppage_migratetype(page))) > > + if (is_migrate_cma(get_pageblock_migratetype(page))) > > __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, > > -(1 << order)); > > This is potentially a source of overhead, I assume patch 6/6 might > change that. Yes, 6/6 removes it altogether. But the test results in this patch's changelog are from this patch in isolation, so it doesn't appear to be a concern even on its own. > > @@ -2457,7 +2423,7 @@ void free_unref_page_list(struct list_head *list) > > * Free isolated pages directly to the allocator, see > > * comment in free_unref_page. > > */ > > - migratetype = get_pcppage_migratetype(page); > > + migratetype = get_pfnblock_migratetype(page, pfn); > > if (unlikely(is_migrate_isolate(migratetype))) { > > list_del(&page->lru); > > free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE); > > I think after this change we should move the isolated pages handling to > the second loop below, so that we wouldn't have to call > get_pfnblock_migratetype() twice per page. Dunno yet if some later patch > does that. It would need to unlock pcp when necessary. That sounds like a great idea. Something like the following? Lightly tested. If you're good with it, I'll beat some more on it and submit it as a follow-up. --- >From 429d13322819ab38b3ba2fad6d1495997819ccc2 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Tue, 12 Sep 2023 10:16:10 -0400 Subject: [PATCH] mm: page_alloc: optimize free_unref_page_list() Move direct freeing of isolated pages to the lock-breaking block in the second loop. This saves an unnecessary migratetype reassessment. Minor comment and local variable scoping cleanups. Suggested-by: Vlastimil Babka Signed-off-by: Johannes Weiner --- mm/page_alloc.c | 49 +++++++++++++++++++++---------------------------- 1 file changed, 21 insertions(+), 28 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e3f1c777feed..9cad31de1bf5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2408,48 +2408,41 @@ void free_unref_page_list(struct list_head *list) struct per_cpu_pages *pcp = NULL; struct zone *locked_zone = NULL; int batch_count = 0; - int migratetype; - - /* Prepare pages for freeing */ - list_for_each_entry_safe(page, next, list, lru) { - unsigned long pfn = page_to_pfn(page); - if (!free_pages_prepare(page, 0, FPI_NONE)) { + list_for_each_entry_safe(page, next, list, lru) + if (!free_pages_prepare(page, 0, FPI_NONE)) list_del(&page->lru); - continue; - } - - /* - * Free isolated pages directly to the allocator, see - * comment in free_unref_page. - */ - migratetype = get_pfnblock_migratetype(page, pfn); - if (unlikely(is_migrate_isolate(migratetype))) { - list_del(&page->lru); - free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE); - continue; - } - } list_for_each_entry_safe(page, next, list, lru) { unsigned long pfn = page_to_pfn(page); struct zone *zone = page_zone(page); + int migratetype; list_del(&page->lru); migratetype = get_pfnblock_migratetype(page, pfn); /* - * Either different zone requiring a different pcp lock or - * excessive lock hold times when freeing a large list of - * pages. + * Zone switch, batch complete, or non-pcp freeing? + * Drop the pcp lock and evaluate. */ - if (zone != locked_zone || batch_count == SWAP_CLUSTER_MAX) { + if (unlikely(zone != locked_zone || + batch_count == SWAP_CLUSTER_MAX || + is_migrate_isolate(migratetype))) { if (pcp) { pcp_spin_unlock(pcp); pcp_trylock_finish(UP_flags); + locked_zone = NULL; } - batch_count = 0; + /* + * Free isolated pages directly to the + * allocator, see comment in free_unref_page. + */ + if (is_migrate_isolate(migratetype)) { + free_one_page(zone, page, pfn, 0, + migratetype, FPI_NONE); + continue; + } /* * trylock is necessary as pages may be getting freed @@ -2459,12 +2452,12 @@ void free_unref_page_list(struct list_head *list) pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (unlikely(!pcp)) { pcp_trylock_finish(UP_flags); - free_one_page(zone, page, pfn, - 0, migratetype, FPI_NONE); - locked_zone = NULL; + free_one_page(zone, page, pfn, 0, + migratetype, FPI_NONE); continue; } locked_zone = zone; + batch_count = 0; } /* -- 2.42.0