All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene
Date: Mon, 16 Oct 2023 16:26:29 -0400	[thread overview]
Message-ID: <20231016202629.GA1042487@cmpxchg.org> (raw)
In-Reply-To: <6F2C52B6-2031-4694-A124-0DFDE9F88E88@nvidia.com>

On Mon, Oct 16, 2023 at 03:49:49PM -0400, Zi Yan wrote:
> On 16 Oct 2023, at 14:51, Johannes Weiner wrote:
> 
> > On Mon, Oct 16, 2023 at 11:00:33AM -0400, Zi Yan wrote:
> >> On 16 Oct 2023, at 10:37, Johannes Weiner wrote:
> >>
> >>> On Mon, Oct 16, 2023 at 09:35:34AM -0400, Zi Yan wrote:
> >>>>> The attached patch has all the suggested changes, let me know how it
> >>>>> looks to you. Thanks.
> >>>>
> >>>> The one I sent has free page accounting issues. The attached one fixes them.
> >>>
> >>> Do you still have the warnings? I wonder what went wrong.
> >>
> >> No warnings. But something with the code:
> >>
> >> 1. in your version, split_free_page() is called without changing any pageblock
> >> migratetypes, then split_free_page() is just a no-op, since the page is
> >> just deleted from the free list, then freed via different orders. Buddy allocator
> >> will merge them back.
> >
> > Hm not quite.
> >
> > If it's the tail block of a buddy, I update its type before
> > splitting. The splitting loop looks up the type of each block for
> > sorting it onto freelists.
> >
> > If it's the head block, yes I split it first according to its old
> > type. But then I let it fall through to scanning the block, which will
> > find that buddy, update its type and move it.
> 
> That is the issue, since split_free_page() assumes the pageblocks of
> that free page have different types. It basically just free the page
> with different small orders summed up to the original free page order.
> If all pageblocks of the free page have the same migratetype, __free_one_page()
> will merge these small order pages back to the original order free page.

duh, of course, you're right. Thanks for patiently explaining this.

> >> 2. in my version, I set pageblock migratetype to new_mt before split_free_page(),
> >> but it causes free page accounting issues, since in the case of head, free pages
> >> are deleted from new_mt when they are in old_mt free list and the accounting
> >> decreases new_mt free page number instead of old_mt one.
> >
> > Right, that makes sense.
> >
> >> Basically, split_free_page() is awkward as it relies on preset migratetypes,
> >> which changes migratetypes without deleting the free pages from the list first.
> >> That is why I came up with the new split_free_page() below.
> >
> > Yeah, the in-between thing is bad. Either it fixes the migratetype
> > before deletion, or it doesn't do the deletion. I'm thinking it would
> > be simpler to move the deletion out instead.
> 
> Yes and no. After deletion, a free page no longer has PageBuddy set and
> has buddy_order information cleared. Either we reset PageBuddy and order
> to the deleted free page, or split_free_page() needs to be changed to
> accept pages without the information (basically remove the PageBuddy
> and order check code).

Good point, that requires extra care.

It's correct in the code now, but it deserves a comment, especially
because of the "buddy" naming in the new split function.

> >> Hmm, if CONFIG_ARCH_FORCE_MAX_ORDER can make a buddy have more than one
> >> pageblock and in turn makes an in-use page have more than one pageblock,
> >> we will have problems. Since in isolate_single_pageblock(), an in-use page
> >> can have part of its pageblock set to a different migratetype and be freed,
> >> causing the free page with unmatched migratetypes. We might need to
> >> free pages at pageblock_order if their orders are bigger than pageblock_order.
> >
> > Is this a practical issue? You mentioned that right now only gigantic
> > pages can be larger than a pageblock, and those are freed in order-0
> > chunks.
> 
> Only if the system allocates a page (non hugetlb pages) with >pageblock_order
> and frees it with the same order. I just do not know if such pages exist on
> other arch than x86. Maybe I just think too much.

Hm, I removed LRU pages from the handling (and added the warning) but
I left in PageMovable(). The only users are z3fold, zsmalloc and
memory ballooning. AFAICS none of them can be bigger than a pageblock.
Let me remove that and add a warning for that case as well.

This way, we only attempt to migrate hugetlb, where we know the free
path - and get warnings for anything else that's larger than expected.

This seems like the safest option. On the off chance that there is a
regression, it won't jeopardize anybody's systems, while the warning
provides all the information we need to debug what's going on.

> > From a0460ad30a24cf73816ac40b262af0ba3723a242 Mon Sep 17 00:00:00 2001
> > From: Johannes Weiner <hannes@cmpxchg.org>
> > Date: Mon, 16 Oct 2023 12:32:21 -0400
> > Subject: [PATCH] mm: page_isolation: prepare for hygienic freelists
> >
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

> It looks good to me. Thanks.
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>

Thank you for all your help!


  reply	other threads:[~2023-10-16 20:26 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-11 19:41 [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Johannes Weiner
2023-09-11 19:41 ` [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching Johannes Weiner
2023-09-11 19:59   ` Zi Yan
2023-09-11 21:09     ` Andrew Morton
2023-09-12 13:47   ` Vlastimil Babka
2023-09-12 14:50     ` Johannes Weiner
2023-09-13  9:33       ` Vlastimil Babka
2023-09-13 13:24         ` Johannes Weiner
2023-09-13 13:34           ` Vlastimil Babka
2023-09-12 15:03     ` Johannes Weiner
2023-09-14  7:29       ` Vlastimil Babka
2023-09-14  9:56   ` Mel Gorman
2023-09-27  5:42   ` Huang, Ying
2023-09-27 14:51     ` Johannes Weiner
2023-09-30  4:26       ` Huang, Ying
2023-10-02 14:58         ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 2/6] mm: page_alloc: fix up block types when merging compatible blocks Johannes Weiner
2023-09-11 20:01   ` Zi Yan
2023-09-13  9:52   ` Vlastimil Babka
2023-09-14 10:00   ` Mel Gorman
2023-09-11 19:41 ` [PATCH 3/6] mm: page_alloc: move free pages when converting block during isolation Johannes Weiner
2023-09-11 20:17   ` Zi Yan
2023-09-11 20:47     ` Johannes Weiner
2023-09-11 20:50       ` Zi Yan
2023-09-13 14:31   ` Vlastimil Babka
2023-09-14 10:03   ` Mel Gorman
2023-09-11 19:41 ` [PATCH 4/6] mm: page_alloc: fix move_freepages_block() range error Johannes Weiner
2023-09-11 20:23   ` Zi Yan
2023-09-13 14:40   ` Vlastimil Babka
2023-09-14 13:37     ` Johannes Weiner
2023-09-14 10:03   ` Mel Gorman
2023-09-11 19:41 ` [PATCH 5/6] mm: page_alloc: fix freelist movement during block conversion Johannes Weiner
2023-09-13 19:52   ` Vlastimil Babka
2023-09-14 14:47     ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 6/6] mm: page_alloc: consolidate free page accounting Johannes Weiner
2023-09-13 20:18   ` Vlastimil Babka
2023-09-14  4:11     ` Johannes Weiner
2023-09-14 23:52 ` [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Mike Kravetz
2023-09-15 14:16   ` Johannes Weiner
2023-09-15 15:05     ` Mike Kravetz
2023-09-16 19:57     ` Mike Kravetz
2023-09-16 20:13       ` Andrew Morton
2023-09-18  7:16       ` Vlastimil Babka
2023-09-18 14:52         ` Johannes Weiner
2023-09-18 17:40           ` Mike Kravetz
2023-09-19  6:49             ` Johannes Weiner
2023-09-19 12:37               ` Zi Yan
2023-09-19 15:22                 ` Zi Yan
2023-09-19 18:47               ` Mike Kravetz
2023-09-19 20:57                 ` Zi Yan
2023-09-20  0:32                   ` Mike Kravetz
2023-09-20  1:38                     ` Zi Yan
2023-09-20  6:07                       ` Vlastimil Babka
2023-09-20 13:48                         ` Johannes Weiner
2023-09-20 16:04                           ` Johannes Weiner
2023-09-20 17:23                             ` Zi Yan
2023-09-21  2:31                               ` Zi Yan
2023-09-21 10:19                                 ` David Hildenbrand
2023-09-21 14:47                                   ` Zi Yan
2023-09-25 21:12                                     ` Zi Yan
2023-09-26 17:39                                       ` Johannes Weiner
2023-09-28  2:51                                         ` Zi Yan
2023-10-03  2:26                                           ` Zi Yan
2023-10-10 21:12                                             ` Johannes Weiner
2023-10-11 15:25                                               ` Johannes Weiner
2023-10-11 15:45                                                 ` Johannes Weiner
2023-10-11 15:57                                                   ` Zi Yan
2023-10-13  0:06                                               ` Zi Yan
2023-10-13 14:51                                                 ` Zi Yan
2023-10-16 13:35                                                   ` Zi Yan
2023-10-16 14:37                                                     ` Johannes Weiner
2023-10-16 15:00                                                       ` Zi Yan
2023-10-16 18:51                                                         ` Johannes Weiner
2023-10-16 19:49                                                           ` Zi Yan
2023-10-16 20:26                                                             ` Johannes Weiner [this message]
2023-10-16 20:39                                                               ` Johannes Weiner
2023-10-16 20:48                                                                 ` Zi Yan
2023-09-26 18:19                                     ` David Hildenbrand
2023-09-28  3:22                                       ` Zi Yan
2023-10-02 11:43                                         ` David Hildenbrand
2023-10-03  2:35                                           ` Zi Yan
2023-09-18  7:07     ` Vlastimil Babka
2023-09-18 14:09       ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231016202629.GA1042487@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mike.kravetz@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.