From: Mike Kravetz <mike.kravetz@oracle.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>,
Miaohe Lin <linmiaohe@huawei.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>, Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene
Date: Mon, 18 Sep 2023 10:40:37 -0700 [thread overview]
Message-ID: <20230918174037.GA112714@monkey> (raw)
In-Reply-To: <20230918145204.GB16104@cmpxchg.org>
On 09/18/23 10:52, Johannes Weiner wrote:
> On Mon, Sep 18, 2023 at 09:16:58AM +0200, Vlastimil Babka wrote:
> > On 9/16/23 21:57, Mike Kravetz wrote:
> > > On 09/15/23 10:16, Johannes Weiner wrote:
> > >> On Thu, Sep 14, 2023 at 04:52:38PM -0700, Mike Kravetz wrote:
> > >
> > > With the patch below applied, a slightly different workload triggers the
> > > following warnings. It seems related, and appears to go away when
> > > reverting the series.
> > >
> > > [ 331.595382] ------------[ cut here ]------------
> > > [ 331.596665] page type is 5, passed migratetype is 1 (nr=512)
> > > [ 331.598121] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:662 expand+0x1c9/0x200
> >
> > Initially I thought this demonstrates the possible race I was suggesting in
> > reply to 6/6. But, assuming you have CONFIG_CMA, page type 5 is cma and we
> > are trying to get a MOVABLE page from a CMA page block, which is something
> > that's normally done and the pageblock stays CMA. So yeah if the warnings
> > are to stay, they need to handle this case. Maybe the same can happen with
> > HIGHATOMIC blocks?
>
> Hm I don't think that's quite it.
>
> CMA and HIGHATOMIC have their own freelists. When MOVABLE requests dip
> into CMA and HIGHATOMIC, we explicitly pass that migratetype to
> __rmqueue_smallest(). This takes a chunk of e.g. CMA, expands the
> remainder to the CMA freelist, then returns the page. While you get a
> different mt than requested, the freelist typing should be consistent.
>
> In this splat, the migratetype passed to __rmqueue_smallest() is
> MOVABLE. There is no preceding warning from del_page_from_freelist()
> (Mike, correct me if I'm wrong), so we got a confirmed MOVABLE
> order-10 block from the MOVABLE list. So far so good. However, when we
> expand() the order-9 tail of this block to the MOVABLE list, it warns
> that its pageblock type is CMA.
>
> This means we have an order-10 page where one half is MOVABLE and the
> other is CMA.
>
> I don't see how the merging code in __free_one_page() could have done
> that. The CMA buddy would have failed the migrate_is_mergeable() test
> and we should have left it at order-9s.
>
> I also don't see how the CMA setup could have done this because
> MIGRATE_CMA is set on the range before the pages are fed to the buddy.
>
> Mike, could you describe the workload that is triggering this?
This 'slightly different workload' is actually a slightly different
environment. Sorry for mis-speaking! The slight difference is that this
environment does not use the 'alloc hugetlb gigantic pages from CMA'
(hugetlb_cma) feature that triggered the previous issue.
This is still on a 16G VM. Kernel command line here is:
"BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.6.0-rc1-next-20230913+
root=UUID=49c13301-2555-44dc-847b-caabe1d62bdf ro console=tty0
console=ttyS0,115200 audit=0 selinux=0 transparent_hugepage=always
hugetlb_free_vmemmap=on"
The workload is just running this script:
while true; do
echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote
echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
done
>
> Does this reproduce instantly and reliably?
>
It is not 'instant' but will reproduce fairly reliably within a minute
or so.
Note that the 'echo 4 > .../hugepages-1048576kB/nr_hugepages' is going
to end up calling alloc_contig_pages -> alloc_contig_range. Those pages
will eventually be freed via __free_pages(folio, 9).
> Is there high load on the system, or is it requesting the huge page
> with not much else going on?
Only the script was running.
> Do you see compact_* history in /proc/vmstat after this triggers?
As one might expect, compact_isolated continually increases during this
this run.
> Could you please also provide /proc/zoneinfo, /proc/pagetypeinfo and
> the hugetlb_cma= parameter you're using?
As mentioned above, hugetlb_cma is not used in this environment. Strangely
enough, this does not reproduce (easily at least) if I use hugetlb_cma as
in the previous report.
The following are during a run after WARNING is triggered.
# cat /proc/zoneinfo
Node 0, zone DMA
per-node stats
nr_inactive_anon 11800
nr_active_anon 109
nr_inactive_file 38161
nr_active_file 10007
nr_unevictable 12
nr_slab_reclaimable 2766
nr_slab_unreclaimable 6881
nr_isolated_anon 0
nr_isolated_file 0
workingset_nodes 0
workingset_refault_anon 0
workingset_refault_file 0
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
nr_anon_pages 11750
nr_mapped 18402
nr_file_pages 48339
nr_dirty 0
nr_writeback 0
nr_writeback_temp 0
nr_shmem 166
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_file_hugepages 0
nr_file_pmdmapped 0
nr_anon_transparent_hugepages 6
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 14766
nr_written 7701
nr_throttled_written 0
nr_kernel_misc_reclaimable 0
nr_foll_pin_acquired 96
nr_foll_pin_released 96
nr_kernel_stack 1816
nr_page_table_pages 1100
nr_sec_page_table_pages 0
nr_swapcached 0
pages free 3840
boost 0
min 21
low 26
high 31
spanned 4095
present 3998
managed 3840
cma 0
protection: (0, 1908, 7923, 7923)
nr_free_pages 3840
nr_zone_inactive_anon 0
nr_zone_active_anon 0
nr_zone_inactive_file 0
nr_zone_active_file 0
nr_zone_unevictable 0
nr_zone_write_pending 0
nr_mlock 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 0
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 0
numa_other 0
pagesets
cpu: 0
count: 0
high: 13
batch: 1
vm stats threshold: 6
cpu: 1
count: 0
high: 13
batch: 1
vm stats threshold: 6
cpu: 2
count: 0
high: 13
batch: 1
vm stats threshold: 6
cpu: 3
count: 0
high: 13
batch: 1
vm stats threshold: 6
node_unreclaimable: 0
start_pfn: 1
Node 0, zone DMA32
pages free 495317
boost 0
min 2687
low 3358
high 4029
spanned 1044480
present 520156
managed 496486
cma 0
protection: (0, 0, 6015, 6015)
nr_free_pages 495317
nr_zone_inactive_anon 0
nr_zone_active_anon 0
nr_zone_inactive_file 0
nr_zone_active_file 0
nr_zone_unevictable 0
nr_zone_write_pending 0
nr_mlock 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 0
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 0
numa_other 0
pagesets
cpu: 0
count: 913
high: 1679
batch: 63
vm stats threshold: 30
cpu: 1
count: 0
high: 1679
batch: 63
vm stats threshold: 30
cpu: 2
count: 0
high: 1679
batch: 63
vm stats threshold: 30
cpu: 3
count: 256
high: 1679
batch: 63
vm stats threshold: 30
node_unreclaimable: 0
start_pfn: 4096
Node 0, zone Normal
pages free 1360836
boost 0
min 8473
low 10591
high 12709
spanned 1572864
present 1572864
managed 1552266
cma 0
protection: (0, 0, 0, 0)
nr_free_pages 1360836
nr_zone_inactive_anon 11800
nr_zone_active_anon 109
nr_zone_inactive_file 38161
nr_zone_active_file 10007
nr_zone_unevictable 12
nr_zone_write_pending 0
nr_mlock 12
nr_bounce 0
nr_zspages 3
nr_free_cma 0
numa_hit 10623572
numa_miss 0
numa_foreign 0
numa_interleave 1357
numa_local 6902986
numa_other 3720586
pagesets
cpu: 0
count: 156
high: 5295
batch: 63
vm stats threshold: 42
cpu: 1
count: 210
high: 5295
batch: 63
vm stats threshold: 42
cpu: 2
count: 4956
high: 5295
batch: 63
vm stats threshold: 42
cpu: 3
count: 1
high: 5295
batch: 63
vm stats threshold: 42
node_unreclaimable: 0
start_pfn: 1048576
Node 0, zone Movable
pages free 0
boost 0
min 32
low 32
high 32
spanned 0
present 0
managed 0
cma 0
protection: (0, 0, 0, 0)
Node 1, zone DMA
pages free 0
boost 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
cma 0
protection: (0, 0, 0, 0)
Node 1, zone DMA32
pages free 0
boost 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
cma 0
protection: (0, 0, 0, 0)
Node 1, zone Normal
per-node stats
nr_inactive_anon 15381
nr_active_anon 81
nr_inactive_file 66550
nr_active_file 25965
nr_unevictable 421
nr_slab_reclaimable 4069
nr_slab_unreclaimable 7836
nr_isolated_anon 0
nr_isolated_file 0
workingset_nodes 0
workingset_refault_anon 0
workingset_refault_file 0
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
nr_anon_pages 15420
nr_mapped 24331
nr_file_pages 92978
nr_dirty 0
nr_writeback 0
nr_writeback_temp 0
nr_shmem 100
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_file_hugepages 0
nr_file_pmdmapped 0
nr_anon_transparent_hugepages 11
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 6217
nr_written 2902
nr_throttled_written 0
nr_kernel_misc_reclaimable 0
nr_foll_pin_acquired 0
nr_foll_pin_released 0
nr_kernel_stack 1656
nr_page_table_pages 756
nr_sec_page_table_pages 0
nr_swapcached 0
pages free 1829073
boost 0
min 11345
low 14181
high 17017
spanned 2097152
present 2097152
managed 2086594
cma 0
protection: (0, 0, 0, 0)
nr_free_pages 1829073
nr_zone_inactive_anon 15381
nr_zone_active_anon 81
nr_zone_inactive_file 66550
nr_zone_active_file 25965
nr_zone_unevictable 421
nr_zone_write_pending 0
nr_mlock 421
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 10522401
numa_miss 0
numa_foreign 0
numa_interleave 961
numa_local 4057399
numa_other 6465002
pagesets
cpu: 0
count: 0
high: 7090
batch: 63
vm stats threshold: 42
cpu: 1
count: 17
high: 7090
batch: 63
vm stats threshold: 42
cpu: 2
count: 6997
high: 7090
batch: 63
vm stats threshold: 42
cpu: 3
count: 0
high: 7090
batch: 63
vm stats threshold: 42
node_unreclaimable: 0
start_pfn: 2621440
Node 1, zone Movable
pages free 0
boost 0
min 32
low 32
high 32
spanned 0
present 0
managed 0
cma 0
protection: (0, 0, 0, 0)
# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 0 0 0 0 0 0 0 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Movable 1 0 1 2 2 3 3 3 4 4 480
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 566 14 22 7 8 8 9 4 7 0 1
Node 0, zone Normal, type Movable 214 299 120 53 15 10 6 6 1 4 1159
Node 0, zone Normal, type Reclaimable 0 9 18 11 6 1 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 0 1016 0 0 0 0
Node 0, zone Normal 71 2995 6 0 0 0
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 1, zone Normal, type Unmovable 459 12 5 6 6 5 5 5 6 2 1
Node 1, zone Normal, type Movable 1287 502 171 85 34 14 13 8 2 5 1861
Node 1, zone Normal, type Reclaimable 1 5 12 6 9 3 1 1 0 1 0
Node 1, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 1, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 1, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 3
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 1, zone Normal 101 3977 10 0 0 8
--
Mike Kravetz
next prev parent reply other threads:[~2023-09-18 17:40 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-11 19:41 [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Johannes Weiner
2023-09-11 19:41 ` [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching Johannes Weiner
2023-09-11 19:59 ` Zi Yan
2023-09-11 21:09 ` Andrew Morton
2023-09-12 13:47 ` Vlastimil Babka
2023-09-12 14:50 ` Johannes Weiner
2023-09-13 9:33 ` Vlastimil Babka
2023-09-13 13:24 ` Johannes Weiner
2023-09-13 13:34 ` Vlastimil Babka
2023-09-12 15:03 ` Johannes Weiner
2023-09-14 7:29 ` Vlastimil Babka
2023-09-14 9:56 ` Mel Gorman
2023-09-27 5:42 ` Huang, Ying
2023-09-27 14:51 ` Johannes Weiner
2023-09-30 4:26 ` Huang, Ying
2023-10-02 14:58 ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 2/6] mm: page_alloc: fix up block types when merging compatible blocks Johannes Weiner
2023-09-11 20:01 ` Zi Yan
2023-09-13 9:52 ` Vlastimil Babka
2023-09-14 10:00 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 3/6] mm: page_alloc: move free pages when converting block during isolation Johannes Weiner
2023-09-11 20:17 ` Zi Yan
2023-09-11 20:47 ` Johannes Weiner
2023-09-11 20:50 ` Zi Yan
2023-09-13 14:31 ` Vlastimil Babka
2023-09-14 10:03 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 4/6] mm: page_alloc: fix move_freepages_block() range error Johannes Weiner
2023-09-11 20:23 ` Zi Yan
2023-09-13 14:40 ` Vlastimil Babka
2023-09-14 13:37 ` Johannes Weiner
2023-09-14 10:03 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 5/6] mm: page_alloc: fix freelist movement during block conversion Johannes Weiner
2023-09-13 19:52 ` Vlastimil Babka
2023-09-14 14:47 ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 6/6] mm: page_alloc: consolidate free page accounting Johannes Weiner
2023-09-13 20:18 ` Vlastimil Babka
2023-09-14 4:11 ` Johannes Weiner
2023-09-14 23:52 ` [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Mike Kravetz
2023-09-15 14:16 ` Johannes Weiner
2023-09-15 15:05 ` Mike Kravetz
2023-09-16 19:57 ` Mike Kravetz
2023-09-16 20:13 ` Andrew Morton
2023-09-18 7:16 ` Vlastimil Babka
2023-09-18 14:52 ` Johannes Weiner
2023-09-18 17:40 ` Mike Kravetz [this message]
2023-09-19 6:49 ` Johannes Weiner
2023-09-19 12:37 ` Zi Yan
2023-09-19 15:22 ` Zi Yan
2023-09-19 18:47 ` Mike Kravetz
2023-09-19 20:57 ` Zi Yan
2023-09-20 0:32 ` Mike Kravetz
2023-09-20 1:38 ` Zi Yan
2023-09-20 6:07 ` Vlastimil Babka
2023-09-20 13:48 ` Johannes Weiner
2023-09-20 16:04 ` Johannes Weiner
2023-09-20 17:23 ` Zi Yan
2023-09-21 2:31 ` Zi Yan
2023-09-21 10:19 ` David Hildenbrand
2023-09-21 14:47 ` Zi Yan
2023-09-25 21:12 ` Zi Yan
2023-09-26 17:39 ` Johannes Weiner
2023-09-28 2:51 ` Zi Yan
2023-10-03 2:26 ` Zi Yan
2023-10-10 21:12 ` Johannes Weiner
2023-10-11 15:25 ` Johannes Weiner
2023-10-11 15:45 ` Johannes Weiner
2023-10-11 15:57 ` Zi Yan
2023-10-13 0:06 ` Zi Yan
2023-10-13 14:51 ` Zi Yan
2023-10-16 13:35 ` Zi Yan
2023-10-16 14:37 ` Johannes Weiner
2023-10-16 15:00 ` Zi Yan
2023-10-16 18:51 ` Johannes Weiner
2023-10-16 19:49 ` Zi Yan
2023-10-16 20:26 ` Johannes Weiner
2023-10-16 20:39 ` Johannes Weiner
2023-10-16 20:48 ` Zi Yan
2023-09-26 18:19 ` David Hildenbrand
2023-09-28 3:22 ` Zi Yan
2023-10-02 11:43 ` David Hildenbrand
2023-10-03 2:35 ` Zi Yan
2023-09-18 7:07 ` Vlastimil Babka
2023-09-18 14:09 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230918174037.GA112714@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.