From: Johannes Weiner <hannes@cmpxchg.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>,
Miaohe Lin <linmiaohe@huawei.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>, Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/6] mm: page_alloc: fix freelist movement during block conversion
Date: Thu, 14 Sep 2023 10:47:49 -0400 [thread overview]
Message-ID: <20230914144749.GF48476@cmpxchg.org> (raw)
In-Reply-To: <5911bf29-b2a0-9016-7071-68334e7d680d@suse.cz>
On Wed, Sep 13, 2023 at 09:52:17PM +0200, Vlastimil Babka wrote:
> On 9/11/23 21:41, Johannes Weiner wrote:
> > @@ -1638,26 +1629,62 @@ static int move_freepages(struct zone *zone,
> > return pages_moved;
> > }
> >
> > -int move_freepages_block(struct zone *zone, struct page *page,
> > - int migratetype, int *num_movable)
> > +static bool prep_move_freepages_block(struct zone *zone, struct page *page,
> > + unsigned long *start_pfn,
> > + unsigned long *end_pfn,
> > + int *num_free, int *num_movable)
> > {
> > - unsigned long start_pfn, end_pfn, pfn;
> > -
> > - if (num_movable)
> > - *num_movable = 0;
> > + unsigned long pfn, start, end;
> >
> > pfn = page_to_pfn(page);
> > - start_pfn = pageblock_start_pfn(pfn);
> > - end_pfn = pageblock_end_pfn(pfn) - 1;
> > + start = pageblock_start_pfn(pfn);
> > + end = pageblock_end_pfn(pfn) - 1;
>
> > /* Do not cross zone boundaries */
> > - if (!zone_spans_pfn(zone, start_pfn))
> > - start_pfn = zone->zone_start_pfn;
> > - if (!zone_spans_pfn(zone, end_pfn))
> > - return 0;
> > + if (!zone_spans_pfn(zone, start))
> > + start = zone->zone_start_pfn;
> > + if (!zone_spans_pfn(zone, end))
> > + return false;
>
> This brings me back to my previous suggestion - if we update the end, won't
> the whole "block straddles >1 zones" situation to check for go away?
>
> Hm or is it actually done because we have a problem by representing
> pageblock migratetype with multiple zones, since there's a single
> pageblock_bitmap entry per the respective pageblock range of pfn's, so one
> zone's migratetype could mess with other's? And now it matters if we want
> 100% match of freelist vs pageblock migratetype?
Yes, it's not safe to change a shared bitmap entry with only one of
the two zones locked.
So I think my range adjustment isn't a complete fix. It's okay for the
case I was directly encountering, where DMA starts with pfn 1 and pfn
0 belongs to nobody. But if the block straddles two genuine zones, a
race is possible.
> (I think even before this series it could have mattered for
> MIGRATETYPE_ISOLATE, is it broken in those corner cases?)
Yes, I think this is buggy indeed.
start_isolate_page_range() calls isolate_single_pageblock() on block
boundaries. It actually does round up to the zone start if the pfn is
below it, since b2c9e2fbba32 ("mm: make alloc_contig_range work at
pageblock granularity") from Zi last year. But it will still set the
migratetype on a straddling block.
And I don't see any handling for the end of the block being in another
zone. It won't move free pages due to the above, but it appears to set
the isolate migratetype in an unlocked zone.
Since nobody has complained about this, I wonder if blocks truly
straddling two different zones isn't just rare but actually
non-existent. The DMA and DMA32 boundaries should naturally align to
multiples of the pageblock order, but there might be exceptions with
ZONE_MOVABLE. Maybe somebody remembers situations where this occurs?
> But in that case we might not be detecting the situation properly for the
> later of the two zones in a pageblock, because if start_pfn is not spanned
> we adjust it and continue? Hmm...
I think what needs to happen is return false in both cases and reject
operation on blocks whose pages are in two different zones. None of
the callers expect it, and don't hold both zone locks that would be
necessary to safely move pages and adjust the migratetype.
This would fix the isolate race, as well as the freelist race that
this series is trying to eliminate.
It would mean that a straddling block can still be stolen from during
fallback, but cannot be claimed entirely and will stay MOVABLE.
It's not perfect, but certainly sounds a lot more reasonable than a
double zone locking scheme for all callers.
next prev parent reply other threads:[~2023-09-14 14:48 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-11 19:41 [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Johannes Weiner
2023-09-11 19:41 ` [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching Johannes Weiner
2023-09-11 19:59 ` Zi Yan
2023-09-11 21:09 ` Andrew Morton
2023-09-12 13:47 ` Vlastimil Babka
2023-09-12 14:50 ` Johannes Weiner
2023-09-13 9:33 ` Vlastimil Babka
2023-09-13 13:24 ` Johannes Weiner
2023-09-13 13:34 ` Vlastimil Babka
2023-09-12 15:03 ` Johannes Weiner
2023-09-14 7:29 ` Vlastimil Babka
2023-09-14 9:56 ` Mel Gorman
2023-09-27 5:42 ` Huang, Ying
2023-09-27 14:51 ` Johannes Weiner
2023-09-30 4:26 ` Huang, Ying
2023-10-02 14:58 ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 2/6] mm: page_alloc: fix up block types when merging compatible blocks Johannes Weiner
2023-09-11 20:01 ` Zi Yan
2023-09-13 9:52 ` Vlastimil Babka
2023-09-14 10:00 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 3/6] mm: page_alloc: move free pages when converting block during isolation Johannes Weiner
2023-09-11 20:17 ` Zi Yan
2023-09-11 20:47 ` Johannes Weiner
2023-09-11 20:50 ` Zi Yan
2023-09-13 14:31 ` Vlastimil Babka
2023-09-14 10:03 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 4/6] mm: page_alloc: fix move_freepages_block() range error Johannes Weiner
2023-09-11 20:23 ` Zi Yan
2023-09-13 14:40 ` Vlastimil Babka
2023-09-14 13:37 ` Johannes Weiner
2023-09-14 10:03 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 5/6] mm: page_alloc: fix freelist movement during block conversion Johannes Weiner
2023-09-13 19:52 ` Vlastimil Babka
2023-09-14 14:47 ` Johannes Weiner [this message]
2023-09-11 19:41 ` [PATCH 6/6] mm: page_alloc: consolidate free page accounting Johannes Weiner
2023-09-13 20:18 ` Vlastimil Babka
2023-09-14 4:11 ` Johannes Weiner
2023-09-14 23:52 ` [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Mike Kravetz
2023-09-15 14:16 ` Johannes Weiner
2023-09-15 15:05 ` Mike Kravetz
2023-09-16 19:57 ` Mike Kravetz
2023-09-16 20:13 ` Andrew Morton
2023-09-18 7:16 ` Vlastimil Babka
2023-09-18 14:52 ` Johannes Weiner
2023-09-18 17:40 ` Mike Kravetz
2023-09-19 6:49 ` Johannes Weiner
2023-09-19 12:37 ` Zi Yan
2023-09-19 15:22 ` Zi Yan
2023-09-19 18:47 ` Mike Kravetz
2023-09-19 20:57 ` Zi Yan
2023-09-20 0:32 ` Mike Kravetz
2023-09-20 1:38 ` Zi Yan
2023-09-20 6:07 ` Vlastimil Babka
2023-09-20 13:48 ` Johannes Weiner
2023-09-20 16:04 ` Johannes Weiner
2023-09-20 17:23 ` Zi Yan
2023-09-21 2:31 ` Zi Yan
2023-09-21 10:19 ` David Hildenbrand
2023-09-21 14:47 ` Zi Yan
2023-09-25 21:12 ` Zi Yan
2023-09-26 17:39 ` Johannes Weiner
2023-09-28 2:51 ` Zi Yan
2023-10-03 2:26 ` Zi Yan
2023-10-10 21:12 ` Johannes Weiner
2023-10-11 15:25 ` Johannes Weiner
2023-10-11 15:45 ` Johannes Weiner
2023-10-11 15:57 ` Zi Yan
2023-10-13 0:06 ` Zi Yan
2023-10-13 14:51 ` Zi Yan
2023-10-16 13:35 ` Zi Yan
2023-10-16 14:37 ` Johannes Weiner
2023-10-16 15:00 ` Zi Yan
2023-10-16 18:51 ` Johannes Weiner
2023-10-16 19:49 ` Zi Yan
2023-10-16 20:26 ` Johannes Weiner
2023-10-16 20:39 ` Johannes Weiner
2023-10-16 20:48 ` Zi Yan
2023-09-26 18:19 ` David Hildenbrand
2023-09-28 3:22 ` Zi Yan
2023-10-02 11:43 ` David Hildenbrand
2023-10-03 2:35 ` Zi Yan
2023-09-18 7:07 ` Vlastimil Babka
2023-09-18 14:09 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230914144749.GF48476@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox