From: Oscar Salvador <osalvador@suse.de>
To: Zi Yan <ziy@nvidia.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
David Hildenbrand <david@redhat.com>,
Michael Ellerman <mpe@ellerman.id.au>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, linux-mm@kvack.org,
iommu@lists.linux-foundation.org,
Eric Ren <renzhengeek@gmail.com>,
Robin Murphy <robin.murphy@arm.com>,
Christoph Hellwig <hch@lst.de>, Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
Date: Fri, 4 Feb 2022 14:56:52 +0100 [thread overview]
Message-ID: <Yf0wpFmtckRRzLFg@localhost.localdomain> (raw)
In-Reply-To: <20220119190623.1029355-5-zi.yan@sent.com>
On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
>
> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
> pageblocks with different migratetypes. It might unnecessarily convert
> extra pageblocks at the beginning and at the end of the range. Change
> alloc_contig_range() to work at pageblock granularity.
>
> It is done by restoring pageblock types and split >pageblock_order free
> pages after isolating at MAX_ORDER-1 granularity and migrating pages
> away at pageblock granularity. The reason for this process is that
> during isolation, some pages, either free or in-use, might have >pageblock
> sizes and isolating part of them can cause free accounting issues.
> Restoring the migratetypes of the pageblocks not in the interesting
> range later is much easier.
Hi Zi Yan,
Due to time constraints I only glanced over, so some comments below
about stuff that caught my eye:
> +static inline void split_free_page_into_pageblocks(struct page *free_page,
> + int order, struct zone *zone)
> +{
> + unsigned long pfn;
> +
> + spin_lock(&zone->lock);
> + del_page_from_free_list(free_page, zone, order);
> + for (pfn = page_to_pfn(free_page);
> + pfn < page_to_pfn(free_page) + (1UL << order);
It migt make sense to have a end_pfn variable so that does not have to
be constantly evaluated. Or maybe the compiler is clever enough to only
evualuate it once.
> + pfn += pageblock_nr_pages) {
> + int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> +
> + __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
> + mt, FPI_NONE);
> + }
> + spin_unlock(&zone->lock);
It is possible that free_page's order is already pageblock_order, so I
would add a one-liner upfront to catch that case and return, otherwise
we do the delete_from_freelist-and-free_it_again dance.
> + /* Save the migratepages of the pageblocks before start and after end */
> + num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
> + + (isolate_end - alloc_end) / pageblock_nr_pages;
> + saved_mt =
> + kmalloc_array(num_pageblock_to_save,
> + sizeof(unsigned char), GFP_KERNEL);
> + if (!saved_mt)
> + return -ENOMEM;
> +
> + num = save_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> + num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
I really hope we can put all this magic within start_isolate_page_range,
and the counterparts in undo_isolate_page_range.
Also, I kinda dislike the &saved_mt thing. I thought about some other
approaches but nothing that wasn't too specific for this case, and I
guess we want that function to be as generic as possible.
> + /*
> + * Split free page spanning [alloc_end, isolate_end) and put the
> + * pageblocks in the right migratetype list
> + */
> + for (outer_end = alloc_end; outer_end < isolate_end;) {
> + unsigned long begin_pfn = outer_end;
> +
> + order = 0;
> + while (!PageBuddy(pfn_to_page(outer_end))) {
> + if (++order >= MAX_ORDER) {
> + outer_end = begin_pfn;
> + break;
> + }
> + outer_end &= ~0UL << order;
> + }
> +
> + if (outer_end != begin_pfn) {
> + order = buddy_order(pfn_to_page(outer_end));
> +
> + /*
> + * split the free page has start page and put the pageblocks
> + * in the right migratetype list
> + */
> + VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);
How could this possibily happen?
> + {
> + struct page *free_page = pfn_to_page(outer_end);
> +
> + split_free_page_into_pageblocks(free_page, order, cc.zone);
> + }
> + outer_end += 1UL << order;
> + } else
> + outer_end = begin_pfn + 1;
> }
I think there are cases could optimize for. If the page has already been
split in pageblock by the outer_start loop, we could skip this outer_end
logic altogether.
E.g: An order-10 page is split in two pageblocks. There's nothing else
to be done, right? We could skip this.
--
Oscar Salvador
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
WARNING: multiple messages have this Message-ID (diff)
From: Oscar Salvador <osalvador@suse.de>
To: Zi Yan <ziy@nvidia.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
David Hildenbrand <david@redhat.com>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, linux-mm@kvack.org,
iommu@lists.linux-foundation.org,
Eric Ren <renzhengeek@gmail.com>,
Robin Murphy <robin.murphy@arm.com>,
Christoph Hellwig <hch@lst.de>, Vlastimil Babka <vbabka@suse.cz>,
Marek Szyprowski <m.szyprowski@samsung.com>
Subject: Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
Date: Fri, 4 Feb 2022 14:56:52 +0100 [thread overview]
Message-ID: <Yf0wpFmtckRRzLFg@localhost.localdomain> (raw)
In-Reply-To: <20220119190623.1029355-5-zi.yan@sent.com>
On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
>
> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
> pageblocks with different migratetypes. It might unnecessarily convert
> extra pageblocks at the beginning and at the end of the range. Change
> alloc_contig_range() to work at pageblock granularity.
>
> It is done by restoring pageblock types and split >pageblock_order free
> pages after isolating at MAX_ORDER-1 granularity and migrating pages
> away at pageblock granularity. The reason for this process is that
> during isolation, some pages, either free or in-use, might have >pageblock
> sizes and isolating part of them can cause free accounting issues.
> Restoring the migratetypes of the pageblocks not in the interesting
> range later is much easier.
Hi Zi Yan,
Due to time constraints I only glanced over, so some comments below
about stuff that caught my eye:
> +static inline void split_free_page_into_pageblocks(struct page *free_page,
> + int order, struct zone *zone)
> +{
> + unsigned long pfn;
> +
> + spin_lock(&zone->lock);
> + del_page_from_free_list(free_page, zone, order);
> + for (pfn = page_to_pfn(free_page);
> + pfn < page_to_pfn(free_page) + (1UL << order);
It migt make sense to have a end_pfn variable so that does not have to
be constantly evaluated. Or maybe the compiler is clever enough to only
evualuate it once.
> + pfn += pageblock_nr_pages) {
> + int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> +
> + __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
> + mt, FPI_NONE);
> + }
> + spin_unlock(&zone->lock);
It is possible that free_page's order is already pageblock_order, so I
would add a one-liner upfront to catch that case and return, otherwise
we do the delete_from_freelist-and-free_it_again dance.
> + /* Save the migratepages of the pageblocks before start and after end */
> + num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
> + + (isolate_end - alloc_end) / pageblock_nr_pages;
> + saved_mt =
> + kmalloc_array(num_pageblock_to_save,
> + sizeof(unsigned char), GFP_KERNEL);
> + if (!saved_mt)
> + return -ENOMEM;
> +
> + num = save_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> + num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
I really hope we can put all this magic within start_isolate_page_range,
and the counterparts in undo_isolate_page_range.
Also, I kinda dislike the &saved_mt thing. I thought about some other
approaches but nothing that wasn't too specific for this case, and I
guess we want that function to be as generic as possible.
> + /*
> + * Split free page spanning [alloc_end, isolate_end) and put the
> + * pageblocks in the right migratetype list
> + */
> + for (outer_end = alloc_end; outer_end < isolate_end;) {
> + unsigned long begin_pfn = outer_end;
> +
> + order = 0;
> + while (!PageBuddy(pfn_to_page(outer_end))) {
> + if (++order >= MAX_ORDER) {
> + outer_end = begin_pfn;
> + break;
> + }
> + outer_end &= ~0UL << order;
> + }
> +
> + if (outer_end != begin_pfn) {
> + order = buddy_order(pfn_to_page(outer_end));
> +
> + /*
> + * split the free page has start page and put the pageblocks
> + * in the right migratetype list
> + */
> + VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);
How could this possibily happen?
> + {
> + struct page *free_page = pfn_to_page(outer_end);
> +
> + split_free_page_into_pageblocks(free_page, order, cc.zone);
> + }
> + outer_end += 1UL << order;
> + } else
> + outer_end = begin_pfn + 1;
> }
I think there are cases could optimize for. If the page has already been
split in pageblock by the outer_start loop, we could skip this outer_end
logic altogether.
E.g: An order-10 page is split in two pageblocks. There's nothing else
to be done, right? We could skip this.
--
Oscar Salvador
SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Oscar Salvador <osalvador@suse.de>
To: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Michael Ellerman <mpe@ellerman.id.au>,
Christoph Hellwig <hch@lst.de>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Robin Murphy <robin.murphy@arm.com>,
linuxppc-dev@lists.ozlabs.org,
virtualization@lists.linux-foundation.org,
iommu@lists.linux-foundation.org,
Vlastimil Babka <vbabka@suse.cz>,
Mel Gorman <mgorman@techsingularity.net>,
Eric Ren <renzhengeek@gmail.com>
Subject: Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
Date: Fri, 4 Feb 2022 14:56:52 +0100 [thread overview]
Message-ID: <Yf0wpFmtckRRzLFg@localhost.localdomain> (raw)
In-Reply-To: <20220119190623.1029355-5-zi.yan@sent.com>
On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
>
> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
> pageblocks with different migratetypes. It might unnecessarily convert
> extra pageblocks at the beginning and at the end of the range. Change
> alloc_contig_range() to work at pageblock granularity.
>
> It is done by restoring pageblock types and split >pageblock_order free
> pages after isolating at MAX_ORDER-1 granularity and migrating pages
> away at pageblock granularity. The reason for this process is that
> during isolation, some pages, either free or in-use, might have >pageblock
> sizes and isolating part of them can cause free accounting issues.
> Restoring the migratetypes of the pageblocks not in the interesting
> range later is much easier.
Hi Zi Yan,
Due to time constraints I only glanced over, so some comments below
about stuff that caught my eye:
> +static inline void split_free_page_into_pageblocks(struct page *free_page,
> + int order, struct zone *zone)
> +{
> + unsigned long pfn;
> +
> + spin_lock(&zone->lock);
> + del_page_from_free_list(free_page, zone, order);
> + for (pfn = page_to_pfn(free_page);
> + pfn < page_to_pfn(free_page) + (1UL << order);
It migt make sense to have a end_pfn variable so that does not have to
be constantly evaluated. Or maybe the compiler is clever enough to only
evualuate it once.
> + pfn += pageblock_nr_pages) {
> + int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> +
> + __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
> + mt, FPI_NONE);
> + }
> + spin_unlock(&zone->lock);
It is possible that free_page's order is already pageblock_order, so I
would add a one-liner upfront to catch that case and return, otherwise
we do the delete_from_freelist-and-free_it_again dance.
> + /* Save the migratepages of the pageblocks before start and after end */
> + num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
> + + (isolate_end - alloc_end) / pageblock_nr_pages;
> + saved_mt =
> + kmalloc_array(num_pageblock_to_save,
> + sizeof(unsigned char), GFP_KERNEL);
> + if (!saved_mt)
> + return -ENOMEM;
> +
> + num = save_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> + num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
I really hope we can put all this magic within start_isolate_page_range,
and the counterparts in undo_isolate_page_range.
Also, I kinda dislike the &saved_mt thing. I thought about some other
approaches but nothing that wasn't too specific for this case, and I
guess we want that function to be as generic as possible.
> + /*
> + * Split free page spanning [alloc_end, isolate_end) and put the
> + * pageblocks in the right migratetype list
> + */
> + for (outer_end = alloc_end; outer_end < isolate_end;) {
> + unsigned long begin_pfn = outer_end;
> +
> + order = 0;
> + while (!PageBuddy(pfn_to_page(outer_end))) {
> + if (++order >= MAX_ORDER) {
> + outer_end = begin_pfn;
> + break;
> + }
> + outer_end &= ~0UL << order;
> + }
> +
> + if (outer_end != begin_pfn) {
> + order = buddy_order(pfn_to_page(outer_end));
> +
> + /*
> + * split the free page has start page and put the pageblocks
> + * in the right migratetype list
> + */
> + VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);
How could this possibily happen?
> + {
> + struct page *free_page = pfn_to_page(outer_end);
> +
> + split_free_page_into_pageblocks(free_page, order, cc.zone);
> + }
> + outer_end += 1UL << order;
> + } else
> + outer_end = begin_pfn + 1;
> }
I think there are cases could optimize for. If the page has already been
split in pageblock by the outer_start loop, we could skip this outer_end
logic altogether.
E.g: An order-10 page is split in two pageblocks. There's nothing else
to be done, right? We could skip this.
--
Oscar Salvador
SUSE Labs
next prev parent reply other threads:[~2022-02-04 13:57 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-19 19:06 [PATCH v4 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-24 14:02 ` Mel Gorman
2022-01-24 14:02 ` Mel Gorman
2022-01-24 14:02 ` Mel Gorman
2022-01-24 14:02 ` Mel Gorman
2022-01-24 16:12 ` Zi Yan via iommu
2022-01-24 16:12 ` Zi Yan
2022-01-24 16:12 ` Zi Yan
2022-01-24 16:43 ` Mel Gorman
2022-01-24 16:43 ` Mel Gorman
2022-01-24 16:43 ` Mel Gorman
2022-01-24 16:43 ` Mel Gorman
2022-01-19 19:06 ` [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-25 6:23 ` Oscar Salvador
2022-01-25 6:23 ` Oscar Salvador
2022-01-25 6:23 ` Oscar Salvador
2022-01-19 19:06 ` [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-24 9:55 ` Oscar Salvador
2022-01-24 9:55 ` Oscar Salvador
2022-01-24 9:55 ` Oscar Salvador
2022-01-24 17:17 ` Zi Yan via iommu
2022-01-24 17:17 ` Zi Yan
2022-01-24 17:17 ` Zi Yan
2022-01-25 13:19 ` Oscar Salvador
2022-01-25 13:19 ` Oscar Salvador
2022-01-25 13:19 ` Oscar Salvador
2022-01-25 13:21 ` Oscar Salvador
2022-01-25 13:21 ` Oscar Salvador
2022-01-25 13:21 ` Oscar Salvador
2022-01-25 16:31 ` Zi Yan via iommu
2022-01-25 16:31 ` Zi Yan
2022-01-25 16:31 ` Zi Yan
2022-02-02 12:18 ` Oscar Salvador
2022-02-02 12:18 ` Oscar Salvador
2022-02-02 12:18 ` Oscar Salvador
2022-02-02 12:25 ` David Hildenbrand
2022-02-02 12:25 ` David Hildenbrand
2022-02-02 12:25 ` David Hildenbrand
2022-02-02 12:25 ` David Hildenbrand
2022-02-02 16:25 ` Zi Yan via iommu
2022-02-02 16:25 ` Zi Yan
2022-02-02 16:25 ` Zi Yan
2022-02-02 16:35 ` Oscar Salvador
2022-02-02 16:35 ` Oscar Salvador
2022-02-02 16:35 ` Oscar Salvador
2022-01-19 19:06 ` [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-02-04 13:56 ` Oscar Salvador [this message]
2022-02-04 13:56 ` Oscar Salvador
2022-02-04 13:56 ` Oscar Salvador
2022-02-04 15:19 ` Zi Yan via iommu
2022-02-04 15:19 ` Zi Yan
2022-02-04 15:19 ` Zi Yan
2022-01-19 19:06 ` [PATCH v4 5/7] mm: cma: use pageblock_order as the single alignment Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` [PATCH v4 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` [PATCH v4 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yf0wpFmtckRRzLFg@localhost.localdomain \
--to=osalvador@suse.de \
--cc=david@redhat.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mgorman@techsingularity.net \
--cc=mpe@ellerman.id.au \
--cc=renzhengeek@gmail.com \
--cc=robin.murphy@arm.com \
--cc=vbabka@suse.cz \
--cc=virtualization@lists.linux-foundation.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.