From: Usama Arif <usamaarif642@gmail.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev,
roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com,
baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org,
willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
kernel-team@meta.com, Shuang Zhai <zhais@google.com>
Subject: Re: [PATCH v3 1/6] mm: free zapped tail pages when splitting isolated thp
Date: Thu, 15 Aug 2024 20:16:56 +0100 [thread overview]
Message-ID: <a4b1b34f-0d8c-490d-ab00-eaedbf3fe780@gmail.com> (raw)
In-Reply-To: <CAMgjq7DhdSXMHPLa3AHgAtcRN0A7pemTFFd1h19w4H_qyUMZMQ@mail.gmail.com>
On 15/08/2024 19:47, Kairui Song wrote:
> On Tue, Aug 13, 2024 at 8:03 PM Usama Arif <usamaarif642@gmail.com> wrote:
>>
>> From: Yu Zhao <yuzhao@google.com>
>>
>> If a tail page has only two references left, one inherited from the
>> isolation of its head and the other from lru_add_page_tail() which we
>> are about to drop, it means this tail page was concurrently zapped.
>> Then we can safely free it and save page reclaim or migration the
>> trouble of trying it.
>>
>> Signed-off-by: Yu Zhao <yuzhao@google.com>
>> Tested-by: Shuang Zhai <zhais@google.com>
>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>> ---
>> mm/huge_memory.c | 27 +++++++++++++++++++++++++++
>> 1 file changed, 27 insertions(+)
>
> Hi, Usama, Yu
>
> This commit is causing the kernel to panic very quickly with build
> kernel test on top of tmpfs with all mTHP enabled, the panic comes
> after:
>
Hi,
Thanks for pointing this out. It is a very silly bug I have introduced going from v1 page version to the folio version of the patch in v3.
Doing below over this patch will fix it:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 907813102430..a6ca454e1168 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3183,7 +3183,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
folio_clear_active(new_folio);
folio_clear_unevictable(new_folio);
- if (!folio_batch_add(&free_folios, folio)) {
+ if (!folio_batch_add(&free_folios, new_folio)) {
mem_cgroup_uncharge_folios(&free_folios);
free_unref_folios(&free_folios);
}
I will include it in the next revision.
> [ 207.147705] BUG: Bad page state in process tar pfn:14ae70
> [ 207.149376] page: refcount:3 mapcount:2 mapping:0000000000000000
> index:0x562d23b70 pfn:0x14ae70
> [ 207.151750] flags:
> 0x17ffffc0020019(locked|uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x1fffff)
> [ 207.154325] raw: 0017ffffc0020019 dead000000000100 dead000000000122
> 0000000000000000
> [ 207.156442] raw: 0000000562d23b70 0000000000000000 0000000300000001
> 0000000000000000
> [ 207.158561] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [ 207.160325] Modules linked in:
> [ 207.161194] CPU: 22 UID: 0 PID: 2650 Comm: tar Not tainted
> 6.11.0-rc3.ptch+ #136
> [ 207.163198] Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015
> [ 207.164946] Call Trace:
> [ 207.165636] <TASK>
> [ 207.166226] dump_stack_lvl+0x53/0x70
> [ 207.167241] bad_page+0x70/0x120
> [ 207.168131] free_page_is_bad+0x5f/0x70
> [ 207.169193] free_unref_folios+0x3a5/0x620
> [ 207.170320] ? __mem_cgroup_uncharge_folios+0x7e/0xa0
> [ 207.171705] __split_huge_page+0xb02/0xcf0
> [ 207.172839] ? smp_call_function_many_cond+0x105/0x4b0
> [ 207.174250] ? __pfx_flush_tlb_func+0x10/0x10
> [ 207.175410] ? on_each_cpu_cond_mask+0x29/0x50
> [ 207.176603] split_huge_page_to_list_to_order+0x857/0x9b0
> [ 207.178052] shrink_folio_list+0x4e1/0x1200
> [ 207.179198] evict_folios+0x468/0xab0
> [ 207.180202] try_to_shrink_lruvec+0x1f3/0x280
> [ 207.181394] shrink_lruvec+0x89/0x780
> [ 207.182398] ? mem_cgroup_iter+0x66/0x290
> [ 207.183488] shrink_node+0x243/0xb00
> [ 207.184474] do_try_to_free_pages+0xbd/0x4e0
> [ 207.185621] try_to_free_mem_cgroup_pages+0x107/0x230
> [ 207.186994] try_charge_memcg+0x184/0x5d0
> [ 207.188092] charge_memcg+0x3a/0x60
> [ 207.189046] __mem_cgroup_charge+0x2c/0x80
> [ 207.190162] shmem_alloc_and_add_folio+0x1a3/0x470
> [ 207.191469] shmem_get_folio_gfp+0x24a/0x670
> [ 207.192635] shmem_write_begin+0x56/0xd0
> [ 207.193703] generic_perform_write+0x140/0x330
> [ 207.194919] shmem_file_write_iter+0x89/0x90
> [ 207.196082] vfs_write+0x2f3/0x420
> [ 207.197019] ksys_write+0x5d/0xd0
> [ 207.197914] do_syscall_64+0x47/0x110
> [ 207.198915] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 207.200293] RIP: 0033:0x7f2e6099c784
> [ 207.201278] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f
> 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 08 0e 00 00 74 13 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 $
> 8 89 e5 48 83 ec 20 48 89
> [ 207.206280] RSP: 002b:00007ffdb1a0e7d8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000001
> [ 207.208312] RAX: ffffffffffffffda RBX: 00000000000005e7 RCX: 00007f2e6099c784
> [ 207.210225] RDX: 00000000000005e7 RSI: 0000562d23b77000 RDI: 0000000000000004
> [ 207.212145] RBP: 00007ffdb1a0e820 R08: 00000000000005e7 R09: 0000000000000007
> [ 207.214064] R10: 0000000000000180 R11: 0000000000000202 R12: 0000562d23b77000
> [ 207.215974] R13: 0000000000000004 R14: 00000000000005e7 R15: 0000000000000000
> [ 207.217888] </TASK>
>
> Test is done using ZRAM as SWAP, 1G memcg, and run:
> cd /mnt/tmpfs
> time tar zxf "$linux_src"
> make -j64 clean
> make defconfig
> /usr/bin/time make -j64
>
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 04ee8abd6475..85a424e954be 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3059,7 +3059,9 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>> unsigned int new_nr = 1 << new_order;
>> int order = folio_order(folio);
>> unsigned int nr = 1 << order;
>> + struct folio_batch free_folios;
>>
>> + folio_batch_init(&free_folios);
>> /* complete memcg works before add pages to LRU */
>> split_page_memcg(head, order, new_order);
>>
>> @@ -3143,6 +3145,26 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>> if (subpage == page)
>> continue;
>> folio_unlock(new_folio);
>> + /*
>> + * If a folio has only two references left, one inherited
>> + * from the isolation of its head and the other from
>> + * lru_add_page_tail() which we are about to drop, it means this
>> + * folio was concurrently zapped. Then we can safely free it
>> + * and save page reclaim or migration the trouble of trying it.
>> + */
>> + if (list && folio_ref_freeze(new_folio, 2)) {
>> + VM_WARN_ON_ONCE_FOLIO(folio_test_lru(new_folio), new_folio);
>> + VM_WARN_ON_ONCE_FOLIO(folio_test_large(new_folio), new_folio);
>> + VM_WARN_ON_ONCE_FOLIO(folio_mapped(new_folio), new_folio);
>> +
>> + folio_clear_active(new_folio);
>> + folio_clear_unevictable(new_folio);
>> + if (!folio_batch_add(&free_folios, folio)) {
>> + mem_cgroup_uncharge_folios(&free_folios);
>> + free_unref_folios(&free_folios);
>> + }
>> + continue;
>> + }
>>
>> /*
>> * Subpages may be freed if there wasn't any mapping
>> @@ -3153,6 +3175,11 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>> */
>> free_page_and_swap_cache(subpage);
>> }
>> +
>> + if (free_folios.nr) {
>> + mem_cgroup_uncharge_folios(&free_folios);
>> + free_unref_folios(&free_folios);
>> + }
>> }
>>
>> /* Racy check whether the huge page can be split */
>> --
>> 2.43.5
>>
>>
next prev parent reply other threads:[~2024-08-15 19:16 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-13 12:02 [PATCH v3 0/6] mm: split underutilized THPs Usama Arif
2024-08-13 12:02 ` [PATCH v3 1/6] mm: free zapped tail pages when splitting isolated thp Usama Arif
2024-08-15 18:47 ` Kairui Song
2024-08-15 19:16 ` Usama Arif [this message]
2024-08-16 16:55 ` Kairui Song
2024-08-16 17:02 ` Usama Arif
2024-08-16 18:11 ` Kairui Song
2024-08-13 12:02 ` [PATCH v3 2/6] mm: remap unused subpages to shared zeropage " Usama Arif
2024-08-13 12:02 ` [PATCH v3 3/6] mm: selftest to verify zero-filled pages are mapped to zeropage Usama Arif
2024-08-13 12:02 ` [PATCH v3 4/6] mm: Introduce a pageflag for partially mapped folios Usama Arif
2024-08-14 3:30 ` Yu Zhao
2024-08-14 10:20 ` Usama Arif
2024-08-14 10:44 ` Barry Song
2024-08-14 10:52 ` Barry Song
2024-08-14 11:11 ` Usama Arif
2024-08-14 11:20 ` Barry Song
2024-08-14 11:26 ` Barry Song
2024-08-14 11:30 ` Usama Arif
2024-08-14 11:10 ` Barry Song
2024-08-14 11:20 ` Usama Arif
2024-08-14 11:23 ` Barry Song
2024-08-14 12:36 ` Usama Arif
2024-08-14 23:05 ` Barry Song
2024-08-15 15:25 ` Usama Arif
2024-08-15 23:30 ` Andrew Morton
2024-08-16 2:50 ` Yu Zhao
2024-08-15 16:33 ` David Hildenbrand
2024-08-15 17:10 ` Usama Arif
2024-08-15 21:06 ` Barry Song
2024-08-15 21:08 ` David Hildenbrand
2024-08-16 15:44 ` Matthew Wilcox
2024-08-16 16:08 ` Usama Arif
2024-08-16 16:28 ` Matthew Wilcox
2024-08-16 16:41 ` Usama Arif
2024-08-13 12:02 ` [PATCH v3 5/6] mm: split underutilized THPs Usama Arif
2024-08-13 12:02 ` [PATCH v3 6/6] mm: add sysfs entry to disable splitting " Usama Arif
2024-08-13 17:22 ` [PATCH v3 0/6] mm: split " Andi Kleen
2024-08-14 10:13 ` Usama Arif
2024-08-18 5:13 ` Hugh Dickins
2024-08-18 7:45 ` David Hildenbrand
2024-08-19 2:38 ` Usama Arif
2024-08-19 2:36 ` Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a4b1b34f-0d8c-490d-ab00-eaedbf3fe780@gmail.com \
--to=usamaarif642@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=cerasuolodomenico@gmail.com \
--cc=corbet@lwn.net \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@surriel.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
--cc=zhais@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.