From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79261CD98C0 for ; Tue, 10 Oct 2023 21:12:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0575A6B026C; Tue, 10 Oct 2023 17:12:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0065C6B026D; Tue, 10 Oct 2023 17:12:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E108C6B026E; Tue, 10 Oct 2023 17:12:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D12796B026C for ; Tue, 10 Oct 2023 17:12:04 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A68CF1CA874 for ; Tue, 10 Oct 2023 21:12:04 +0000 (UTC) X-FDA: 81330799368.17.E96BED1 Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) by imf26.hostedemail.com (Postfix) with ESMTP id 8CA8B140017 for ; Tue, 10 Oct 2023 21:12:02 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=WA0Jsp2d; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf26.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.171 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696972322; a=rsa-sha256; cv=none; b=3OEsP0yUv8PEMWS2gxjLsJgpJgh5Fns52gJlYHz9vylH1IsVqJkXPWI8Dn7q8TYtVHy9mZ /wqi0DNLnU45QKN9t1pZsT7/c7GC9kvWMAkffQqdau8rC87XsVTKrOT8M7rz5S8Bo79cgh taUGFulMCsWScTRE47tnXSXvSm7R2YA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=WA0Jsp2d; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf26.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.171 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696972322; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HC+Qu78EEjL4KMzF8E1/E4mWpZo9bo+hSG7Ou0hFeX8=; b=qYlBE6uWKMEf1yEGL5/yNr2GAFFk4XcV5ufq9Q0MIeximrg3bv97HyMbnetZFX7rCyCyxb JBeh+0jlAttwB1az1BoC+3twg2ZZ5gn9o6HB9THnpmfAAPfKvPkKM0bpWq6u7MnpVLWc8D i2KGawRjQYRNNQcX2RzPTzzFHBoJKY8= Received: by mail-qk1-f171.google.com with SMTP id af79cd13be357-7740aa4b545so411168185a.3 for ; Tue, 10 Oct 2023 14:12:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1696972321; x=1697577121; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=HC+Qu78EEjL4KMzF8E1/E4mWpZo9bo+hSG7Ou0hFeX8=; b=WA0Jsp2dmmzX0lj2VdPvz89glDzqj16oh9cAzpU8b9ij5mNZjQynmFadNCAafMXghX WJcHWbns8fsEzhgZXrON9sgtxmBRVZIozg7vySGKSXapgJmSAquDarT/ACnqBCLv9G8+ dx+xHGoJL9mAMmJewwb6XJTn7z6HU93Jh3dZDmOgynEEVOjv/tmBMKeLzFxjunvPNib5 pLy0hsAT4eK7pGYZPSbQiO9M2WPxcHyaarZqQZVtDstIuFYZwx3l2FFtrntoqB5y3IDq fl+hsVYOfjGdTo6W+6oxDOFnSyQxIevV7Txw6zBQTSPtCAl53uvbQPM0lEDGeSJ/G1n9 WRwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696972321; x=1697577121; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=HC+Qu78EEjL4KMzF8E1/E4mWpZo9bo+hSG7Ou0hFeX8=; b=jHdx1zhP4oVv58QGUcYIzSRJ0qAi518YivimxRrb9NOg/lzRGeT0bxmYj3Kj9Dys37 +1W9dyxAkJg6q5qAL1b8H9H1+cSCXystGl6nF48vsY3/md9l9dvQ2hVwbTJIoJwoCyf9 Yf3LVJAgMeSenfRffgmgy14/BNzEtb5reQ4+vFeLrcEhCQD2hhhzZXM7xVQRfZrT54/C oG6c+KIEOg4zB9L54kl1wHaNxeCBbp+794Le9lsqC31tHy4w+tdD4wMZbXndfYFYdV6d Gjhe83ZuVypzJevrp1sayAtNn3uK7qsjdEKAubX4nbGr638bypnW8z14dfK/Ek7XrYhu d83A== X-Gm-Message-State: AOJu0YzxOFktizG0Hh+jb98x8yUVMdsHstORbv9oNDJoI2Lkhx00sxYv zrqOYA0M03zf/vIPNJI+Upv9JQ== X-Google-Smtp-Source: AGHT+IExnLFXiLD1bsUeG9yYjXMxJ0iw/TVc3aNNBZALtevcf2FAZXGtfGrOEOjHIJ+VIeYIlTL/mg== X-Received: by 2002:a05:620a:7f9:b0:76e:e619:1f95 with SMTP id k25-20020a05620a07f900b0076ee6191f95mr15298836qkk.78.1696972321554; Tue, 10 Oct 2023 14:12:01 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:66a6]) by smtp.gmail.com with ESMTPSA id b14-20020a05620a118e00b00773f008da40sm4633646qkk.125.2023.10.10.14.12.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Oct 2023 14:12:01 -0700 (PDT) Date: Tue, 10 Oct 2023 17:12:00 -0400 From: Johannes Weiner To: Zi Yan Cc: David Hildenbrand , Vlastimil Babka , Mike Kravetz , Andrew Morton , Mel Gorman , Miaohe Lin , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Message-ID: <20231010211200.GA129823@cmpxchg.org> References: <20230920134811.GB124289@cmpxchg.org> <20230920160400.GC124289@cmpxchg.org> <762CA634-053A-41DD-8ED7-895374640858@nvidia.com> <505e7f55-f63a-b33d-aa10-44de16d2d3cc@redhat.com> <4466F447-43D3-43CD-8930-FBE9A49028BA@nvidia.com> <92AE29D4-E715-447C-AF99-ECF42383C74D@nvidia.com> <20230926173939.GA348484@cmpxchg.org> <0D2BD71D-5E65-4175-8872-5E70278C57DA@nvidia.com> <329AB331-DDC1-4074-A85E-AB5CF866CE84@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <329AB331-DDC1-4074-A85E-AB5CF866CE84@nvidia.com> X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8CA8B140017 X-Stat-Signature: ifx8an3gzawzgy6m7eafagusbtdg8nfe X-HE-Tag: 1696972322-137862 X-HE-Meta: U2FsdGVkX1/oHSspwyGNHvAIGjIpbhJAgg9l1ekjC4l0Z4BCNNX8B0+FQhKthz1g2LlIfpjDQf6X7LpK66ud4fdf4I0t6YtwnyheG+SfmE47tpmgDQe5PztD06uL1W2yKktpwflbvcJxvHBbgA4SiDLzpF/Y4PBZAW2iKDM8HXMk89JgpIsjqMdF7bvIfhp9XCq5jO9xzdmTihkayEwFpvdzzYHFMKXCt1PgHYuqJR+VTUSEOT9FHqKi4oQEEmcP3f66FqRhNFWdFHkLMj0CSmFVT6rP67G+CFKoP2bEUolyWfxRJ3Mau5IV6zvKduKe4Jo7q4vEBODuPWLXaBTrprziv2GxgBVqX3VcAiqvOF1UmyT9s1fkhefEZAw564keXx+Yr7qB8x0UFqp1mcXR8eEMMYeSoBsRhp/Ela1ZzEqqAsDtsalqQBpbaJbYFhloZtWed4KvHo7+5PK/bywQx99UnzqSItFNOvZHOd66KIRF+NoJq2oOyVLUf1s9JI5QzKIMsd57iTrdQI2b7LQPcTRcaEablubVT4jh1pECvtQAAxhfWAM6uSRuwiFG/uNNp8vUcTbpKe7yqr8obvIYTE3DnH3QNIoWIczdT5WcdXBBhOtsVf50s6guXTkXVew3nKOtAFZ97wrNzXH8J5J3oeJlWHVuzakWqGtnDqko2kasdYw6+CVtD3TXpMMoDkHF0oQVLoXj9pC+UhXh2KR16HAAh+p4PUEO6tPjDI7jcS7GEIswaVX+b/D2KhXdEFcNLLjhmrdnyMV1JP7JK4l4UsD3ctZGRF9CI1eqPMM+Xh7CV9JYEMRQxED3l9hcJzoA2OrEsQf9IQjbhtO4nR2Yn+u1U2CCw6myil4Pgu4daVZ/JyzeDGv6aoUcIzmcpemiIHX6EV3Ujn+2qWvwn1Zlhl8JVxCXLKxo7ABaPpHPBc1aaFp6eyKJRQmlCA8v8BBimO6crjMB89BmoagqXBQ 1XIGCvEt HfuGSg7nIwWBqJTNFolpzwtNUhGaU8h+Gvdpv+LwkuoSmY9v8HSiAhpQSbaA5rOquwMtyMueUAE1kj4BdWMxs4Thoxza5U+Su4KWPca2ei7FipG2Sl0P16/BhaqqlGxkqfsUqLFzFm23bgy1h/7mywZdSjHe9SU3wpe9Iir4XOKPGs0rTadwC8A2iLKqty4GtJkzn8zalYjVOdocJbHA0LtARa4xaaDdBktCjnbeLnBHKphiI7Xwe4zR7eAtm7/+6EpxEKdvLu7pQVjsO19NS0PACE9wOJXT/9QXfWTp4PuOn5XXUt8ufAygenAcxnAoU2WJR4RSo7G+xovPMhEUbKAVJY12Hjw43m/w52tEV8JGtC7HTGsvRDelvHftHLUnCBQfpbaGlQFR/SgFYtwr+1Fdx3MStCLO1JXN4A3ozZjEboop3hagVN4Yvcg4UTAsi+/92Wuqb0OnyvYtHs++XKSr/uZzBxB8gsr3Em7QoFpcqB3PQ8qnwJ8GlkA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello! On Mon, Oct 02, 2023 at 10:26:44PM -0400, Zi Yan wrote: > On 27 Sep 2023, at 22:51, Zi Yan wrote: > I attached my revised patch 2 and 3 (with all the suggestions above). Thanks! It took me a bit to read through them. It's a tricky codebase! Some comments below. > From 1c8f99cff5f469ee89adc33e9c9499254cad13f2 Mon Sep 17 00:00:00 2001 > From: Zi Yan > Date: Mon, 25 Sep 2023 16:27:14 -0400 > Subject: [PATCH v2 1/2] mm: set migratetype after free pages are moved between > free lists. > > This avoids changing migratetype after move_freepages() or > move_freepages_block(), which is error prone. It also prepares for upcoming > changes to fix move_freepages() not moving free pages partially in the > range. > > Signed-off-by: Zi Yan This is great and indeed makes the callsites much simpler. Thanks, I'll fold this into the series. > @@ -1597,9 +1615,29 @@ static int move_freepages(struct zone *zone, unsigned long start_pfn, > unsigned long end_pfn, int old_mt, int new_mt) > { > struct page *page; > - unsigned long pfn; > + unsigned long pfn, pfn2; > unsigned int order; > int pages_moved = 0; > + unsigned long mt_changed_pfn = start_pfn - pageblock_nr_pages; > + unsigned long new_start_pfn = get_freepage_start_pfn(start_pfn); > + > + /* split at start_pfn if it is in the middle of a free page */ > + if (new_start_pfn != start_pfn && PageBuddy(pfn_to_page(new_start_pfn))) { > + struct page *new_page = pfn_to_page(new_start_pfn); > + int new_page_order = buddy_order(new_page); get_freepage_start_pfn() returns start_pfn if it didn't find a large buddy, so the buddy check shouldn't be necessary, right? > + if (new_start_pfn + (1 << new_page_order) > start_pfn) { This *should* be implied according to the comments on get_freepage_start_pfn(), but it currently isn't. Doing so would help here, and seemingly also in alloc_contig_range(). How about this version of get_freepage_start_pfn()? /* * Scan the range before this pfn for a buddy that straddles it */ static unsigned long find_straddling_buddy(unsigned long start_pfn) { int order = 0; struct page *page; unsigned long pfn = start_pfn; while (!PageBuddy(page = pfn_to_page(pfn))) { /* Nothing found */ if (++order > MAX_ORDER) return start_pfn; pfn &= ~0UL << order; } /* * Found a preceding buddy, but does it straddle? */ if (pfn + (1 << buddy_order(page)) > start_pfn) return pfn; /* Nothing found */ return start_pfn; } > @@ -1614,10 +1652,43 @@ static int move_freepages(struct zone *zone, unsigned long start_pfn, > > order = buddy_order(page); > move_to_free_list(page, zone, order, old_mt, new_mt); > + /* > + * set page migratetype 1) only after we move all free pages in > + * one pageblock and 2) for all pageblocks within the page. > + * > + * for 1), since move_to_free_list() checks page migratetype with > + * old_mt and changing one page migratetype affects all pages > + * within the same pageblock, if we are moving more than > + * one free pages in the same pageblock, setting migratetype > + * right after first move_to_free_list() triggers the warning > + * in the following move_to_free_list(). > + * > + * for 2), when a free page order is greater than pageblock_order, > + * all pageblocks within the free page need to be changed after > + * move_to_free_list(). I think this can be somewhat simplified. There are two assumptions we can make. Buddies always consist of 2^n pages. And buddies and pageblocks are naturally aligned. This means that if this pageblock has the start of a buddy that straddles into the next pageblock(s), it must be the first page in the block. That in turn means we can move the handling before the loop. If we split first, it also makes the loop a little simpler because we know that any buddies that start inside this block cannot extend beyond it (due to the alignment). The loop how it was originally written can remain untouched. > + */ > + if (pfn + (1 << order) > pageblock_end_pfn(pfn)) { > + for (pfn2 = pfn; > + pfn2 < min_t(unsigned long, > + pfn + (1 << order), > + end_pfn + 1); > + pfn2 += pageblock_nr_pages) { > + set_pageblock_migratetype(pfn_to_page(pfn2), > + new_mt); > + mt_changed_pfn = pfn2; Hm, this seems to assume that start_pfn to end_pfn can be more than one block. Why is that? This function is only used on single blocks. > + } > + /* split the free page if it goes beyond the specified range */ > + if (pfn + (1 << order) > (end_pfn + 1)) > + split_free_page(page, order, end_pfn + 1 - pfn); > + } > pfn += 1 << order; > pages_moved += 1 << order; > } > - set_pageblock_migratetype(pfn_to_page(start_pfn), new_mt); > + /* set migratetype for the remaining pageblocks */ > + for (pfn2 = mt_changed_pfn + pageblock_nr_pages; > + pfn2 <= end_pfn; > + pfn2 += pageblock_nr_pages) > + set_pageblock_migratetype(pfn_to_page(pfn2), new_mt); If I rework the code on the above, I'm arriving at the following: static int move_freepages(struct zone *zone, unsigned long start_pfn, unsigned long end_pfn, int old_mt, int new_mt) { struct page *start_page = pfn_to_page(start_pfn); int pages_moved = 0; unsigned long pfn; VM_WARN_ON(start_pfn & (pageblock_nr_pages - 1)); VM_WARN_ON(start_pfn + pageblock_nr_pages - 1 != end_pfn); /* * A free page may be comprised of 2^n blocks, which means our * block of interest could be head or tail in such a page. * * If we're a tail, update the type of our block, then split * the page into pageblocks. The splitting will do the leg * work of sorting the blocks into the right freelists. * * If we're a head, split the page into pageblocks first. This * ensures the migratetypes still match up during the freelist * removal. Then do the regular scan for buddies in the block * of interest, which will handle the rest. * * In theory, we could try to preserve 2^1 and larger blocks * that lie outside our range. In practice, MAX_ORDER is * usually one or two pageblocks anyway, so don't bother. * * Note that this only applies to page isolation, which calls * this on random blocks in the pfn range! When we move stuff * from inside the page allocator, the pages are coming off * the freelist (can't be tail) and multi-block pages are * handled directly in the stealing code (can't be a head). */ /* We're a tail */ pfn = find_straddling_buddy(start_pfn); if (pfn != start_pfn) { struct page *free_page = pfn_to_page(pfn); set_pageblock_migratetype(start_page, new_mt); split_free_page(free_page, buddy_order(free_page), pageblock_nr_pages); return pageblock_nr_pages; } /* We're a head */ if (PageBuddy(start_page) && buddy_order(start_page) > pageblock_order) split_free_page(start_page, buddy_order(start_page), pageblock_nr_pages); /* Move buddies within the block */ while (pfn <= end_pfn) { struct page *page = pfn_to_page(pfn); int order, nr_pages; if (!PageBuddy(page)) { pfn++; continue; } /* Make sure we are not inadvertently changing nodes */ VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page); VM_BUG_ON_PAGE(page_zone(page) != zone, page); order = buddy_order(page); nr_pages = 1 << order; move_to_free_list(page, zone, order, old_mt, new_mt); pfn += nr_pages; pages_moved += nr_pages; } set_pageblock_migratetype(start_page, new_mt); return pages_moved; } Does this look reasonable to you? Note that the page isolation specific stuff comes first. If this code holds up, we should be able to move it to page-isolation.c and keep it out of the regular allocator path. Thanks!