From: Mike Kravetz <mike.kravetz@oracle.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>,
Miaohe Lin <linmiaohe@huawei.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>, Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene
Date: Mon, 18 Sep 2023 10:40:37 -0700 [thread overview]
Message-ID: <20230918174037.GA112714@monkey> (raw)
In-Reply-To: <20230918145204.GB16104@cmpxchg.org>
On 09/18/23 10:52, Johannes Weiner wrote:
> On Mon, Sep 18, 2023 at 09:16:58AM +0200, Vlastimil Babka wrote:
> > On 9/16/23 21:57, Mike Kravetz wrote:
> > > On 09/15/23 10:16, Johannes Weiner wrote:
> > >> On Thu, Sep 14, 2023 at 04:52:38PM -0700, Mike Kravetz wrote:
> > >
> > > With the patch below applied, a slightly different workload triggers the
> > > following warnings. It seems related, and appears to go away when
> > > reverting the series.
> > >
> > > [ 331.595382] ------------[ cut here ]------------
> > > [ 331.596665] page type is 5, passed migratetype is 1 (nr=512)
> > > [ 331.598121] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:662 expand+0x1c9/0x200
> >
> > Initially I thought this demonstrates the possible race I was suggesting in
> > reply to 6/6. But, assuming you have CONFIG_CMA, page type 5 is cma and we
> > are trying to get a MOVABLE page from a CMA page block, which is something
> > that's normally done and the pageblock stays CMA. So yeah if the warnings
> > are to stay, they need to handle this case. Maybe the same can happen with
> > HIGHATOMIC blocks?
>
> Hm I don't think that's quite it.
>
> CMA and HIGHATOMIC have their own freelists. When MOVABLE requests dip
> into CMA and HIGHATOMIC, we explicitly pass that migratetype to
> __rmqueue_smallest(). This takes a chunk of e.g. CMA, expands the
> remainder to the CMA freelist, then returns the page. While you get a
> different mt than requested, the freelist typing should be consistent.
>
> In this splat, the migratetype passed to __rmqueue_smallest() is
> MOVABLE. There is no preceding warning from del_page_from_freelist()
> (Mike, correct me if I'm wrong), so we got a confirmed MOVABLE
> order-10 block from the MOVABLE list. So far so good. However, when we
> expand() the order-9 tail of this block to the MOVABLE list, it warns
> that its pageblock type is CMA.
>
> This means we have an order-10 page where one half is MOVABLE and the
> other is CMA.
>
> I don't see how the merging code in __free_one_page() could have done
> that. The CMA buddy would have failed the migrate_is_mergeable() test
> and we should have left it at order-9s.
>
> I also don't see how the CMA setup could have done this because
> MIGRATE_CMA is set on the range before the pages are fed to the buddy.
>
> Mike, could you describe the workload that is triggering this?
This 'slightly different workload' is actually a slightly different
environment. Sorry for mis-speaking! The slight difference is that this
environment does not use the 'alloc hugetlb gigantic pages from CMA'
(hugetlb_cma) feature that triggered the previous issue.
This is still on a 16G VM. Kernel command line here is:
"BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.6.0-rc1-next-20230913+
root=UUID=49c13301-2555-44dc-847b-caabe1d62bdf ro console=tty0
console=ttyS0,115200 audit=0 selinux=0 transparent_hugepage=always
hugetlb_free_vmemmap=on"
The workload is just running this script:
while true; do
echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote
echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
done
>
> Does this reproduce instantly and reliably?
>
It is not 'instant' but will reproduce fairly reliably within a minute
or so.
Note that the 'echo 4 > .../hugepages-1048576kB/nr_hugepages' is going
to end up calling alloc_contig_pages -> alloc_contig_range. Those pages
will eventually be freed via __free_pages(folio, 9).
> Is there high load on the system, or is it requesting the huge page
> with not much else going on?
Only the script was running.
> Do you see compact_* history in /proc/vmstat after this triggers?
As one might expect, compact_isolated continually increases during this
this run.
> Could you please also provide /proc/zoneinfo, /proc/pagetypeinfo and
> the hugetlb_cma= parameter you're using?
As mentioned above, hugetlb_cma is not used in this environment. Strangely
enough, this does not reproduce (easily at least) if I use hugetlb_cma as
in the previous report.
The following are during a run after WARNING is triggered.
# cat /proc/zoneinfo
Node 0, zone DMA
per-node stats
nr_inactive_anon 11800
nr_active_anon 109
nr_inactive_file 38161
nr_active_file 10007
nr_unevictable 12
nr_slab_reclaimable 2766
nr_slab_unreclaimable 6881
nr_isolated_anon 0
nr_isolated_file 0
workingset_nodes 0
workingset_refault_anon 0
workingset_refault_file 0
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
nr_anon_pages 11750
nr_mapped 18402
nr_file_pages 48339
nr_dirty 0
nr_writeback 0
nr_writeback_temp 0
nr_shmem 166
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_file_hugepages 0
nr_file_pmdmapped 0
nr_anon_transparent_hugepages 6
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 14766
nr_written 7701
nr_throttled_written 0
nr_kernel_misc_reclaimable 0
nr_foll_pin_acquired 96
nr_foll_pin_released 96
nr_kernel_stack 1816
nr_page_table_pages 1100
nr_sec_page_table_pages 0
nr_swapcached 0
pages free 3840
boost 0
min 21
low 26
high 31
spanned 4095
present 3998
managed 3840
cma 0
protection: (0, 1908, 7923, 7923)
nr_free_pages 3840
nr_zone_inactive_anon 0
nr_zone_active_anon 0
nr_zone_inactive_file 0
nr_zone_active_file 0
nr_zone_unevictable 0
nr_zone_write_pending 0
nr_mlock 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 0
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 0
numa_other 0
pagesets
cpu: 0
count: 0
high: 13
batch: 1
vm stats threshold: 6
cpu: 1
count: 0
high: 13
batch: 1
vm stats threshold: 6
cpu: 2
count: 0
high: 13
batch: 1
vm stats threshold: 6
cpu: 3
count: 0
high: 13
batch: 1
vm stats threshold: 6
node_unreclaimable: 0
start_pfn: 1
Node 0, zone DMA32
pages free 495317
boost 0
min 2687
low 3358
high 4029
spanned 1044480
present 520156
managed 496486
cma 0
protection: (0, 0, 6015, 6015)
nr_free_pages 495317
nr_zone_inactive_anon 0
nr_zone_active_anon 0
nr_zone_inactive_file 0
nr_zone_active_file 0
nr_zone_unevictable 0
nr_zone_write_pending 0
nr_mlock 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 0
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 0
numa_other 0
pagesets
cpu: 0
count: 913
high: 1679
batch: 63
vm stats threshold: 30
cpu: 1
count: 0
high: 1679
batch: 63
vm stats threshold: 30
cpu: 2
count: 0
high: 1679
batch: 63
vm stats threshold: 30
cpu: 3
count: 256
high: 1679
batch: 63
vm stats threshold: 30
node_unreclaimable: 0
start_pfn: 4096
Node 0, zone Normal
pages free 1360836
boost 0
min 8473
low 10591
high 12709
spanned 1572864
present 1572864
managed 1552266
cma 0
protection: (0, 0, 0, 0)
nr_free_pages 1360836
nr_zone_inactive_anon 11800
nr_zone_active_anon 109
nr_zone_inactive_file 38161
nr_zone_active_file 10007
nr_zone_unevictable 12
nr_zone_write_pending 0
nr_mlock 12
nr_bounce 0
nr_zspages 3
nr_free_cma 0
numa_hit 10623572
numa_miss 0
numa_foreign 0
numa_interleave 1357
numa_local 6902986
numa_other 3720586
pagesets
cpu: 0
count: 156
high: 5295
batch: 63
vm stats threshold: 42
cpu: 1
count: 210
high: 5295
batch: 63
vm stats threshold: 42
cpu: 2
count: 4956
high: 5295
batch: 63
vm stats threshold: 42
cpu: 3
count: 1
high: 5295
batch: 63
vm stats threshold: 42
node_unreclaimable: 0
start_pfn: 1048576
Node 0, zone Movable
pages free 0
boost 0
min 32
low 32
high 32
spanned 0
present 0
managed 0
cma 0
protection: (0, 0, 0, 0)
Node 1, zone DMA
pages free 0
boost 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
cma 0
protection: (0, 0, 0, 0)
Node 1, zone DMA32
pages free 0
boost 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
cma 0
protection: (0, 0, 0, 0)
Node 1, zone Normal
per-node stats
nr_inactive_anon 15381
nr_active_anon 81
nr_inactive_file 66550
nr_active_file 25965
nr_unevictable 421
nr_slab_reclaimable 4069
nr_slab_unreclaimable 7836
nr_isolated_anon 0
nr_isolated_file 0
workingset_nodes 0
workingset_refault_anon 0
workingset_refault_file 0
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
nr_anon_pages 15420
nr_mapped 24331
nr_file_pages 92978
nr_dirty 0
nr_writeback 0
nr_writeback_temp 0
nr_shmem 100
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_file_hugepages 0
nr_file_pmdmapped 0
nr_anon_transparent_hugepages 11
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 6217
nr_written 2902
nr_throttled_written 0
nr_kernel_misc_reclaimable 0
nr_foll_pin_acquired 0
nr_foll_pin_released 0
nr_kernel_stack 1656
nr_page_table_pages 756
nr_sec_page_table_pages 0
nr_swapcached 0
pages free 1829073
boost 0
min 11345
low 14181
high 17017
spanned 2097152
present 2097152
managed 2086594
cma 0
protection: (0, 0, 0, 0)
nr_free_pages 1829073
nr_zone_inactive_anon 15381
nr_zone_active_anon 81
nr_zone_inactive_file 66550
nr_zone_active_file 25965
nr_zone_unevictable 421
nr_zone_write_pending 0
nr_mlock 421
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 10522401
numa_miss 0
numa_foreign 0
numa_interleave 961
numa_local 4057399
numa_other 6465002
pagesets
cpu: 0
count: 0
high: 7090
batch: 63
vm stats threshold: 42
cpu: 1
count: 17
high: 7090
batch: 63
vm stats threshold: 42
cpu: 2
count: 6997
high: 7090
batch: 63
vm stats threshold: 42
cpu: 3
count: 0
high: 7090
batch: 63
vm stats threshold: 42
node_unreclaimable: 0
start_pfn: 2621440
Node 1, zone Movable
pages free 0
boost 0
min 32
low 32
high 32
spanned 0
present 0
managed 0
cma 0
protection: (0, 0, 0, 0)
# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 0 0 0 0 0 0 0 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Movable 1 0 1 2 2 3 3 3 4 4 480
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 566 14 22 7 8 8 9 4 7 0 1
Node 0, zone Normal, type Movable 214 299 120 53 15 10 6 6 1 4 1159
Node 0, zone Normal, type Reclaimable 0 9 18 11 6 1 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 0 1016 0 0 0 0
Node 0, zone Normal 71 2995 6 0 0 0
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 1, zone Normal, type Unmovable 459 12 5 6 6 5 5 5 6 2 1
Node 1, zone Normal, type Movable 1287 502 171 85 34 14 13 8 2 5 1861
Node 1, zone Normal, type Reclaimable 1 5 12 6 9 3 1 1 0 1 0
Node 1, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 1, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 1, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 3
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 1, zone Normal 101 3977 10 0 0 8
--
Mike Kravetz
next prev parent reply other threads:[~2023-09-18 17:40 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-11 19:41 [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Johannes Weiner
2023-09-11 19:41 ` [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching Johannes Weiner
2023-09-11 19:59 ` Zi Yan
2023-09-11 21:09 ` Andrew Morton
2023-09-12 13:47 ` Vlastimil Babka
2023-09-12 14:50 ` Johannes Weiner
2023-09-13 9:33 ` Vlastimil Babka
2023-09-13 13:24 ` Johannes Weiner
2023-09-13 13:34 ` Vlastimil Babka
2023-09-12 15:03 ` Johannes Weiner
2023-09-14 7:29 ` Vlastimil Babka
2023-09-14 9:56 ` Mel Gorman
2023-09-27 5:42 ` Huang, Ying
2023-09-27 14:51 ` Johannes Weiner
2023-09-30 4:26 ` Huang, Ying
2023-10-02 14:58 ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 2/6] mm: page_alloc: fix up block types when merging compatible blocks Johannes Weiner
2023-09-11 20:01 ` Zi Yan
2023-09-13 9:52 ` Vlastimil Babka
2023-09-14 10:00 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 3/6] mm: page_alloc: move free pages when converting block during isolation Johannes Weiner
2023-09-11 20:17 ` Zi Yan
2023-09-11 20:47 ` Johannes Weiner
2023-09-11 20:50 ` Zi Yan
2023-09-13 14:31 ` Vlastimil Babka
2023-09-14 10:03 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 4/6] mm: page_alloc: fix move_freepages_block() range error Johannes Weiner
2023-09-11 20:23 ` Zi Yan
2023-09-13 14:40 ` Vlastimil Babka
2023-09-14 13:37 ` Johannes Weiner
2023-09-14 10:03 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 5/6] mm: page_alloc: fix freelist movement during block conversion Johannes Weiner
2023-09-13 19:52 ` Vlastimil Babka
2023-09-14 14:47 ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 6/6] mm: page_alloc: consolidate free page accounting Johannes Weiner
2023-09-13 20:18 ` Vlastimil Babka
2023-09-14 4:11 ` Johannes Weiner
2023-09-14 23:52 ` [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Mike Kravetz
2023-09-15 14:16 ` Johannes Weiner
2023-09-15 15:05 ` Mike Kravetz
2023-09-16 19:57 ` Mike Kravetz
2023-09-16 20:13 ` Andrew Morton
2023-09-18 7:16 ` Vlastimil Babka
2023-09-18 14:52 ` Johannes Weiner
2023-09-18 17:40 ` Mike Kravetz [this message]
2023-09-19 6:49 ` Johannes Weiner
2023-09-19 12:37 ` Zi Yan
2023-09-19 15:22 ` Zi Yan
2023-09-19 18:47 ` Mike Kravetz
2023-09-19 20:57 ` Zi Yan
2023-09-20 0:32 ` Mike Kravetz
2023-09-20 1:38 ` Zi Yan
2023-09-20 6:07 ` Vlastimil Babka
2023-09-20 13:48 ` Johannes Weiner
2023-09-20 16:04 ` Johannes Weiner
2023-09-20 17:23 ` Zi Yan
2023-09-21 2:31 ` Zi Yan
2023-09-21 10:19 ` David Hildenbrand
2023-09-21 14:47 ` Zi Yan
2023-09-25 21:12 ` Zi Yan
2023-09-26 17:39 ` Johannes Weiner
2023-09-28 2:51 ` Zi Yan
2023-10-03 2:26 ` Zi Yan
2023-10-10 21:12 ` Johannes Weiner
2023-10-11 15:25 ` Johannes Weiner
2023-10-11 15:45 ` Johannes Weiner
2023-10-11 15:57 ` Zi Yan
2023-10-13 0:06 ` Zi Yan
2023-10-13 14:51 ` Zi Yan
2023-10-16 13:35 ` Zi Yan
2023-10-16 14:37 ` Johannes Weiner
2023-10-16 15:00 ` Zi Yan
2023-10-16 18:51 ` Johannes Weiner
2023-10-16 19:49 ` Zi Yan
2023-10-16 20:26 ` Johannes Weiner
2023-10-16 20:39 ` Johannes Weiner
2023-10-16 20:48 ` Zi Yan
2023-09-26 18:19 ` David Hildenbrand
2023-09-28 3:22 ` Zi Yan
2023-10-02 11:43 ` David Hildenbrand
2023-10-03 2:35 ` Zi Yan
2023-09-18 7:07 ` Vlastimil Babka
2023-09-18 14:09 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230918174037.GA112714@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).