* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
[not found] ` <20240822135018.1931258-5-kernel@pankajraghav.com>
@ 2024-08-29 10:51 ` Sven Schnelle
2024-08-29 18:46 ` Luis Chamberlain
0 siblings, 1 reply; 12+ messages in thread
From: Sven Schnelle @ 2024-08-29 10:51 UTC (permalink / raw)
To: Pankaj Raghav (Samsung)
Cc: brauner, akpm, chandan.babu, linux-fsdevel, djwong, hare,
gost.dev, linux-xfs, hch, david, Zi Yan, yang, linux-kernel,
linux-mm, willy, john.g.garry, cl, p.raghav, mcgrof, ryan.roberts,
David Howells, linux-s390
Hi,
"Pankaj Raghav (Samsung)" <kernel@pankajraghav.com> writes:
> From: Luis Chamberlain <mcgrof@kernel.org>
>
> split_folio() and split_folio_to_list() assume order 0, to support
> minorder for non-anonymous folios, we must expand these to check the
> folio mapping order and use that.
>
> Set new_order to be at least minimum folio order if it is set in
> split_huge_page_to_list() so that we can maintain minimum folio order
> requirement in the page cache.
>
> Update the debugfs write files used for testing to ensure the order
> is respected as well. We simply enforce the min order when a file
> mapping is used.
>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Tested-by: David Howells <dhowells@redhat.com>
This causes the following warning on s390 with linux-next starting from
next-20240827:
[ 112.690518] BUG: Bad page map in process ksm01 pte:a5801317 pmd:99054000
[ 112.690531] page: refcount:0 mapcount:-1 mapping:0000000000000000 index:0x3ff86102 pfn:0xa5801
[ 112.690536] flags: 0x3ffff00000000004(referenced|node=0|zone=1|lastcpupid=0x1ffff)
[ 112.690543] raw: 3ffff00000000004 0000001d47439e30 0000001d47439e30 0000000000000000
[ 112.690546] raw: 000000003ff86102 0000000000000000 fffffffe00000000 0000000000000000
[ 112.690548] page dumped because: bad pte
[ 112.690549] addr:000003ff86102000 vm_flags:88100073 anon_vma:000000008c8e46e8 mapping:0000000000000000 index:3ff86102
[ 112.690553] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
[ 112.690561] CPU: 1 UID: 0 PID: 604 Comm: ksm01 Not tainted 6.11.0-rc5-next-20240827-dirty #1441
[ 112.690565] Hardware name: IBM 3931 A01 704 (z/VM 7.3.0)
[ 112.690568] Call Trace:
[ 112.690571] [<000003ffe0eb77fe>] dump_stack_lvl+0x76/0xa0
[ 112.690579] [<000003ffe03f4a90>] print_bad_pte+0x280/0x2d0
[ 112.690584] [<000003ffe03f7654>] zap_present_ptes.isra.0+0x5c4/0x870
[ 112.690598] [<000003ffe03f7a46>] zap_pte_range+0x146/0x3d0
[ 112.690601] [<000003ffe03f7f1c>] zap_p4d_range+0x24c/0x4b0
[ 112.690603] [<000003ffe03f84ea>] unmap_page_range+0xea/0x2c0
[ 112.690605] [<000003ffe03f8754>] unmap_single_vma.isra.0+0x94/0xf0
[ 112.690607] [<000003ffe03f8866>] unmap_vmas+0xb6/0x1a0
[ 112.690609] [<000003ffe0405724>] exit_mmap+0xc4/0x3e0
[ 112.690613] [<000003ffe0154aa2>] mmput+0x72/0x170
[ 112.690616] [<000003ffe015e2c6>] exit_mm+0xd6/0x150
[ 112.690618] [<000003ffe015e52c>] do_exit+0x1ec/0x490
[ 112.690620] [<000003ffe015e9a4>] do_group_exit+0x44/0xc0
[ 112.690621] [<000003ffe016f000>] get_signal+0x7f0/0x800
[ 112.690624] [<000003ffe0108614>] arch_do_signal_or_restart+0x74/0x320
[ 112.690628] [<000003ffe020c876>] syscall_exit_to_user_mode_work+0xe6/0x170
[ 112.690632] [<000003ffe0eb7c04>] __do_syscall+0xd4/0x1c0
[ 112.690634] [<000003ffe0ec303c>] system_call+0x74/0x98
[ 112.690638] Disabling lock debugging due to kernel taint
To reproduce, running the ksm01 testsuite from ltp seems to be
enough. The splat is always triggered immediately. The output from ksm01
is:
tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
tst_test.c:1809: TINFO: LTP version: 20240524-208-g6c3293c6f
tst_test.c:1813: TINFO: Tested kernel: 6.11.0-rc5-next-20240827 #1440 SMP Thu Aug 29 12:13:28 CEST 2024 s390x
tst_test.c:1652: TINFO: Timeout per run is 0h 00m 30s
mem.c:422: TINFO: wait for all children to stop.
mem.c:388: TINFO: child 0 stops.
mem.c:388: TINFO: child 1 stops.
mem.c:388: TINFO: child 2 stops.
mem.c:495: TINFO: KSM merging...
mem.c:434: TINFO: resume all children.
mem.c:422: TINFO: wait for all children to stop.
mem.c:344: TINFO: child 0 continues...
mem.c:347: TINFO: child 0 allocates 128 MB filled with 'c'
mem.c:344: TINFO: child 1 continues...
mem.c:347: TINFO: child 1 allocates 128 MB filled with 'a'
mem.c:344: TINFO: child 2 continues...
mem.c:347: TINFO: child 2 allocates 128 MB filled with 'a'
mem.c:400: TINFO: child 1 stops.
mem.c:400: TINFO: child 2 stops.
mem.c:400: TINFO: child 0 stops.
Test timeouted, sending SIGKILL!
tst_test.c:1700: TINFO: Killed the leftover descendant processes
tst_test.c:1706: TINFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
tst_test.c:1708: TBROK: Test killed! (timeout?)
Thanks
Sven
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
2024-08-29 10:51 ` [PATCH v13 04/10] mm: split a folio in minimum folio order chunks Sven Schnelle
@ 2024-08-29 18:46 ` Luis Chamberlain
2024-08-29 19:55 ` Matthew Wilcox
0 siblings, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2024-08-29 18:46 UTC (permalink / raw)
To: Sven Schnelle
Cc: Pankaj Raghav (Samsung), brauner, akpm, chandan.babu,
linux-fsdevel, djwong, hare, gost.dev, linux-xfs, hch, david,
Zi Yan, yang, linux-kernel, linux-mm, willy, john.g.garry, cl,
p.raghav, ryan.roberts, David Howells, linux-s390
On Thu, Aug 29, 2024 at 12:51:25PM +0200, Sven Schnelle wrote:
> Hi,
>
> "Pankaj Raghav (Samsung)" <kernel@pankajraghav.com> writes:
>
> > From: Luis Chamberlain <mcgrof@kernel.org>
> >
> > split_folio() and split_folio_to_list() assume order 0, to support
> > minorder for non-anonymous folios, we must expand these to check the
> > folio mapping order and use that.
> >
> > Set new_order to be at least minimum folio order if it is set in
> > split_huge_page_to_list() so that we can maintain minimum folio order
> > requirement in the page cache.
> >
> > Update the debugfs write files used for testing to ensure the order
> > is respected as well. We simply enforce the min order when a file
> > mapping is used.
> >
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
> > Reviewed-by: Hannes Reinecke <hare@suse.de>
> > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > Tested-by: David Howells <dhowells@redhat.com>
>
> This causes the following warning on s390 with linux-next starting from
> next-20240827:
>
> [ 112.690518] BUG: Bad page map in process ksm01 pte:a5801317 pmd:99054000
> [ 112.690531] page: refcount:0 mapcount:-1 mapping:0000000000000000 index:0x3ff86102 pfn:0xa5801
> [ 112.690536] flags: 0x3ffff00000000004(referenced|node=0|zone=1|lastcpupid=0x1ffff)
> [ 112.690543] raw: 3ffff00000000004 0000001d47439e30 0000001d47439e30 0000000000000000
> [ 112.690546] raw: 000000003ff86102 0000000000000000 fffffffe00000000 0000000000000000
> [ 112.690548] page dumped because: bad pte
> [ 112.690549] addr:000003ff86102000 vm_flags:88100073 anon_vma:000000008c8e46e8 mapping:0000000000000000 index:3ff86102
> [ 112.690553] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
> [ 112.690561] CPU: 1 UID: 0 PID: 604 Comm: ksm01 Not tainted 6.11.0-rc5-next-20240827-dirty #1441
> [ 112.690565] Hardware name: IBM 3931 A01 704 (z/VM 7.3.0)
> [ 112.690568] Call Trace:
> [ 112.690571] [<000003ffe0eb77fe>] dump_stack_lvl+0x76/0xa0
> [ 112.690579] [<000003ffe03f4a90>] print_bad_pte+0x280/0x2d0
> [ 112.690584] [<000003ffe03f7654>] zap_present_ptes.isra.0+0x5c4/0x870
> [ 112.690598] [<000003ffe03f7a46>] zap_pte_range+0x146/0x3d0
> [ 112.690601] [<000003ffe03f7f1c>] zap_p4d_range+0x24c/0x4b0
> [ 112.690603] [<000003ffe03f84ea>] unmap_page_range+0xea/0x2c0
> [ 112.690605] [<000003ffe03f8754>] unmap_single_vma.isra.0+0x94/0xf0
> [ 112.690607] [<000003ffe03f8866>] unmap_vmas+0xb6/0x1a0
> [ 112.690609] [<000003ffe0405724>] exit_mmap+0xc4/0x3e0
> [ 112.690613] [<000003ffe0154aa2>] mmput+0x72/0x170
> [ 112.690616] [<000003ffe015e2c6>] exit_mm+0xd6/0x150
> [ 112.690618] [<000003ffe015e52c>] do_exit+0x1ec/0x490
> [ 112.690620] [<000003ffe015e9a4>] do_group_exit+0x44/0xc0
> [ 112.690621] [<000003ffe016f000>] get_signal+0x7f0/0x800
> [ 112.690624] [<000003ffe0108614>] arch_do_signal_or_restart+0x74/0x320
> [ 112.690628] [<000003ffe020c876>] syscall_exit_to_user_mode_work+0xe6/0x170
> [ 112.690632] [<000003ffe0eb7c04>] __do_syscall+0xd4/0x1c0
> [ 112.690634] [<000003ffe0ec303c>] system_call+0x74/0x98
> [ 112.690638] Disabling lock debugging due to kernel taint
>
> To reproduce, running the ksm01 testsuite from ltp seems to be
> enough. The splat is always triggered immediately. The output from ksm01
> is:
>
> tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
> tst_test.c:1809: TINFO: LTP version: 20240524-208-g6c3293c6f
> tst_test.c:1813: TINFO: Tested kernel: 6.11.0-rc5-next-20240827 #1440 SMP Thu Aug 29 12:13:28 CEST 2024 s390x
> tst_test.c:1652: TINFO: Timeout per run is 0h 00m 30s
> mem.c:422: TINFO: wait for all children to stop.
> mem.c:388: TINFO: child 0 stops.
> mem.c:388: TINFO: child 1 stops.
> mem.c:388: TINFO: child 2 stops.
> mem.c:495: TINFO: KSM merging...
> mem.c:434: TINFO: resume all children.
> mem.c:422: TINFO: wait for all children to stop.
> mem.c:344: TINFO: child 0 continues...
> mem.c:347: TINFO: child 0 allocates 128 MB filled with 'c'
> mem.c:344: TINFO: child 1 continues...
> mem.c:347: TINFO: child 1 allocates 128 MB filled with 'a'
> mem.c:344: TINFO: child 2 continues...
> mem.c:347: TINFO: child 2 allocates 128 MB filled with 'a'
> mem.c:400: TINFO: child 1 stops.
> mem.c:400: TINFO: child 2 stops.
> mem.c:400: TINFO: child 0 stops.
> Test timeouted, sending SIGKILL!
> tst_test.c:1700: TINFO: Killed the leftover descendant processes
> tst_test.c:1706: TINFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
> tst_test.c:1708: TBROK: Test killed! (timeout?)
Thanks for the report and test reproducer, I was able to reproduce on
x86_64 easily with ltp/testcases/kernel/mem/ksm/ksm01 as well:
ltp/testcases/kernel/mem/ksm/ksm01
tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
tst_test.c:1809: TINFO: LTP version: 20240524-208-g6c3293c6fc20
tst_test.c:1813: TINFO: Tested kernel: 6.11.0-rc5-next-20240827 #56 SMP
PREEMPT_DYNAMIC Tue Aug 27 08:10:26 UTC 2024 x86_64
tst_test.c:1652: TINFO: Timeout per run is 0h 00m 30s
mem.c:422: TINFO: wait for all children to stop.
mem.c:388: TINFO: child 0 stops.
mem.c:388: TINFO: child 1 stops.
mem.c:388: TINFO: child 2 stops.
mem.c:495: TINFO: KSM merging...
mem.c:434: TINFO: resume all children.
mem.c:422: TINFO: wait for all children to stop.
mem.c:344: TINFO: child 0 continues...
mem.c:344: TINFO: child 2 continues...
mem.c:344: TINFO: child 1 continues...
mem.c:347: TINFO: child 1 allocates 128 MB filled with 'a'
mem.c:347: TINFO: child 0 allocates 128 MB filled with 'c'
mem.c:347: TINFO: child 2 allocates 128 MB filled with 'a'
mem.c:400: TINFO: child 1 stops.
mem.c:400: TINFO: child 0 stops.
mem.c:400: TINFO: child 2 stops.
Test timeouted, sending SIGKILL!
tst_test.c:1700: TINFO: Killed the leftover descendant processes
tst_test.c:1706: TINFO: If you are running on slow machine, try
exporting LTP_TIMEOUT_MUL > 1
tst_test.c:1708: TBROK: Test killed! (timeout?)
Summary:
passed 0
failed 0
broken 1
skipped 0
warnings 0
With vm debugging however I get more information about the issue:
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: page: refcount:1 mapcount:1 mapping:0000000000000000 index:0x7f589dd7f pfn:0x211d7f
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: memcg:ffff93ba245b8800
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: anon flags: 0x17fffe000020838(uptodate|dirty|lru|owner_2|swapbacked|node=0|zone=2|lastcpupid=0x1ffff)
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: raw: 017fffe000020838 ffffe59008475f88 ffffe59008476008 ffff93ba2abca5b1
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: raw: 00000007f589dd7f 0000000000000000 0000000100000000 ffff93ba245b8800
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ------------[ cut here ]------------
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: kernel BUG at mm/filemap.c:1509!
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: CPU: 2 UID: 0 PID: 74 Comm: ksmd Not tainted 6.11.0-rc5-next-20240827 #56
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RIP: 0010:folio_unlock+0x43/0x50
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Code: 93 fc ff ff f0 80 30 01 78 06 5b c3 cc cc cc cc 48 89 df 31 f6 5b e9 dc fc ff ff 48 c7 c6 a0 56 49 89 48 89 df e8 2d 03 05 00 <0f> 0b 90 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RSP: 0018:ffffbb1dc02afe38 EFLAGS: 00010246
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RAX: 000000000000003f RBX: ffffe59008475fc0 RCX: 0000000000000000
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000003
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: R10: ffffbb1dc02afce0 R11: ffffffff896c3608 R12: ffffe59008475fc0
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: R13: 0000000000000000 R14: ffffe59008470000 R15: ffffffff89f88060
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: FS: 0000000000000000(0000) GS:ffff93c15fc80000(0000) knlGS:0000000000000000
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: CR2: 0000558e368d9c48 CR3: 000000010ca66004 CR4: 0000000000770ef0
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: PKRU: 55555554
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Call Trace:
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: <TASK>
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? die+0x32/0x80
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? do_trap+0xd9/0x100
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? do_error_trap+0x6a/0x90
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? exc_invalid_op+0x4c/0x60
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? asm_exc_invalid_op+0x16/0x20
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ksm_scan_thread+0x175b/0x1d30
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? __pfx_ksm_scan_thread+0x10/0x10
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: kthread+0xda/0x110
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? __pfx_kthread+0x10/0x10
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ret_from_fork+0x2d/0x50
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? __pfx_kthread+0x10/0x10
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ret_from_fork_asm+0x1a/0x30
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: </TASK>
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Modules linked in: xfs nvme_fabrics 9p kvm_intel kvm crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd cryptd pcspkr 9pnet_virtio joydev virtio_balloon virtio_console evdev button serio_raw drm nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 md_mod virtio_net net_failover virtio_blk failover nvme psmouse crc32_pclmul crc32c_intel nvme_core virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ---[ end trace 0000000000000000 ]---
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RIP: 0010:folio_unlock+0x43/0x50
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Code: 93 fc ff ff f0 80 30 01 78 06 5b c3 cc cc cc cc 48 89 df 31 f6 5b e9 dc fc ff ff 48 c7 c6 a0 56 49 89 48 89 df e8 2d 03 05 00 <0f> 0b 90 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RSP: 0018:ffffbb1dc02afe38 EFLAGS: 00010246
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RAX: 000000000000003f RBX: ffffe59008475fc0 RCX: 0000000000000000
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000003
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: R10: ffffbb1dc02afce0 R11: ffffffff896c3608 R12: ffffe59008475fc0
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: R13: 0000000000000000 R14: ffffe59008470000 R15: ffffffff89f88060
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: FS: 0000000000000000(0000) GS:ffff93c15fc80000(0000) knlGS:0000000000000000
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: CR2: 0000558e368d9c48 CR3: 000000010ca66004 CR4: 0000000000770ef0
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: PKRU: 55555554
Looking at the KSM code in context ksm_scan_thread+0x175 is mm/ksm.c routine
cmp_and_merge_page() on the split case:
} else if (split) {
/*
* We are here if we tried to merge two pages and
* failed because they both belonged to the same
* compound page. We will split the page now, but no
* merging will take place.
* We do not want to add the cost of a full lock; if
* the page is locked, it is better to skip it and
* perhaps try again later.
*/
if (!trylock_page(page))
return;
split_huge_page(page);
unlock_page(page);
}
The trylock_page() is faulty. I'm digging in further.
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
2024-08-29 18:46 ` Luis Chamberlain
@ 2024-08-29 19:55 ` Matthew Wilcox
2024-08-29 22:12 ` Zi Yan
0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2024-08-29 19:55 UTC (permalink / raw)
To: Luis Chamberlain
Cc: Sven Schnelle, Pankaj Raghav (Samsung), brauner, akpm,
chandan.babu, linux-fsdevel, djwong, hare, gost.dev, linux-xfs,
hch, david, Zi Yan, yang, linux-kernel, linux-mm, john.g.garry,
cl, p.raghav, ryan.roberts, David Howells, linux-s390
On Thu, Aug 29, 2024 at 11:46:42AM -0700, Luis Chamberlain wrote:
> With vm debugging however I get more information about the issue:
>
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: page: refcount:1 mapcount:1 mapping:0000000000000000 index:0x7f589dd7f pfn:0x211d7f
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: memcg:ffff93ba245b8800
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: anon flags: 0x17fffe000020838(uptodate|dirty|lru|owner_2|swapbacked|node=0|zone=2|lastcpupid=0x1ffff)
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: raw: 017fffe000020838 ffffe59008475f88 ffffe59008476008 ffff93ba2abca5b1
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: raw: 00000007f589dd7f 0000000000000000 0000000100000000 ffff93ba245b8800
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ------------[ cut here ]------------
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: kernel BUG at mm/filemap.c:1509!
This is in folio_unlock(). We're trying to unlock a folio which isn't
locked!
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: CPU: 2 UID: 0 PID: 74 Comm: ksmd Not tainted 6.11.0-rc5-next-20240827 #56
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RIP: 0010:folio_unlock+0x43/0x50
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Code: 93 fc ff ff f0 80 30 01 78 06 5b c3 cc cc cc cc 48 89 df 31 f6 5b e9 dc fc ff ff 48 c7 c6 a0 56 49 89 48 89 df e8 2d 03 05 00 <0f> 0b 90 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RSP: 0018:ffffbb1dc02afe38 EFLAGS: 00010246
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RAX: 000000000000003f RBX: ffffe59008475fc0 RCX: 0000000000000000
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000003
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: R10: ffffbb1dc02afce0 R11: ffffffff896c3608 R12: ffffe59008475fc0
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: R13: 0000000000000000 R14: ffffe59008470000 R15: ffffffff89f88060
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: FS: 0000000000000000(0000) GS:ffff93c15fc80000(0000) knlGS:0000000000000000
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: CR2: 0000558e368d9c48 CR3: 000000010ca66004 CR4: 0000000000770ef0
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: PKRU: 55555554
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Call Trace:
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: <TASK>
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? die+0x32/0x80
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? do_trap+0xd9/0x100
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? do_error_trap+0x6a/0x90
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? exc_invalid_op+0x4c/0x60
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? asm_exc_invalid_op+0x16/0x20
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ksm_scan_thread+0x175b/0x1d30
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? __pfx_ksm_scan_thread+0x10/0x10
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: kthread+0xda/0x110
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? __pfx_kthread+0x10/0x10
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ret_from_fork+0x2d/0x50
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? __pfx_kthread+0x10/0x10
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ret_from_fork_asm+0x1a/0x30
> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: </TASK>
[...]
> Looking at the KSM code in context ksm_scan_thread+0x175 is mm/ksm.c routine
> cmp_and_merge_page() on the split case:
>
> } else if (split) {
> /*
> * We are here if we tried to merge two pages and
> * failed because they both belonged to the same
> * compound page. We will split the page now, but no
> * merging will take place.
> * We do not want to add the cost of a full lock; if
> * the page is locked, it is better to skip it and
> * perhaps try again later.
> */
> if (!trylock_page(page))
> return;
> split_huge_page(page);
> unlock_page(page);
Obviously the page is locked when we call split_huge_page(). There's
an assert inside it. And the lock bit is _supposed_ to be transferred
to the head page of the page which is being split. My guess is that
this is messed up somehow; we're perhaps transferring the lock bit to
the wrong page?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
2024-08-29 19:55 ` Matthew Wilcox
@ 2024-08-29 22:12 ` Zi Yan
2024-08-29 23:41 ` Luis Chamberlain
0 siblings, 1 reply; 12+ messages in thread
From: Zi Yan @ 2024-08-29 22:12 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Luis Chamberlain, Sven Schnelle, Pankaj Raghav (Samsung), brauner,
akpm, chandan.babu, linux-fsdevel, djwong, hare, gost.dev,
linux-xfs, hch, david, yang, linux-kernel, linux-mm, john.g.garry,
cl, p.raghav, ryan.roberts, David Howells, linux-s390
[-- Attachment #1: Type: text/plain, Size: 7906 bytes --]
On 29 Aug 2024, at 15:55, Matthew Wilcox wrote:
> On Thu, Aug 29, 2024 at 11:46:42AM -0700, Luis Chamberlain wrote:
>> With vm debugging however I get more information about the issue:
>>
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: page: refcount:1 mapcount:1 mapping:0000000000000000 index:0x7f589dd7f pfn:0x211d7f
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: memcg:ffff93ba245b8800
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: anon flags: 0x17fffe000020838(uptodate|dirty|lru|owner_2|swapbacked|node=0|zone=2|lastcpupid=0x1ffff)
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: raw: 017fffe000020838 ffffe59008475f88 ffffe59008476008 ffff93ba2abca5b1
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: raw: 00000007f589dd7f 0000000000000000 0000000100000000 ffff93ba245b8800
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ------------[ cut here ]------------
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: kernel BUG at mm/filemap.c:1509!
>
> This is in folio_unlock(). We're trying to unlock a folio which isn't
> locked!
>
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: CPU: 2 UID: 0 PID: 74 Comm: ksmd Not tainted 6.11.0-rc5-next-20240827 #56
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RIP: 0010:folio_unlock+0x43/0x50
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Code: 93 fc ff ff f0 80 30 01 78 06 5b c3 cc cc cc cc 48 89 df 31 f6 5b e9 dc fc ff ff 48 c7 c6 a0 56 49 89 48 89 df e8 2d 03 05 00 <0f> 0b 90 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RSP: 0018:ffffbb1dc02afe38 EFLAGS: 00010246
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RAX: 000000000000003f RBX: ffffe59008475fc0 RCX: 0000000000000000
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000003
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: R10: ffffbb1dc02afce0 R11: ffffffff896c3608 R12: ffffe59008475fc0
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: R13: 0000000000000000 R14: ffffe59008470000 R15: ffffffff89f88060
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: FS: 0000000000000000(0000) GS:ffff93c15fc80000(0000) knlGS:0000000000000000
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: CR2: 0000558e368d9c48 CR3: 000000010ca66004 CR4: 0000000000770ef0
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: PKRU: 55555554
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: Call Trace:
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: <TASK>
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? die+0x32/0x80
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? do_trap+0xd9/0x100
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? do_error_trap+0x6a/0x90
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? exc_invalid_op+0x4c/0x60
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? asm_exc_invalid_op+0x16/0x20
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? folio_unlock+0x43/0x50
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ksm_scan_thread+0x175b/0x1d30
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? __pfx_ksm_scan_thread+0x10/0x10
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: kthread+0xda/0x110
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? __pfx_kthread+0x10/0x10
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ret_from_fork+0x2d/0x50
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ? __pfx_kthread+0x10/0x10
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: ret_from_fork_asm+0x1a/0x30
>> Aug 29 18:08:22 nvme-xfs-reflink-4k kernel: </TASK>
> [...]
>> Looking at the KSM code in context ksm_scan_thread+0x175 is mm/ksm.c routine
>> cmp_and_merge_page() on the split case:
>>
>> } else if (split) {
>> /*
>> * We are here if we tried to merge two pages and
>> * failed because they both belonged to the same
>> * compound page. We will split the page now, but no
>> * merging will take place.
>> * We do not want to add the cost of a full lock; if
>> * the page is locked, it is better to skip it and
>> * perhaps try again later.
>> */
>> if (!trylock_page(page))
>> return;
>> split_huge_page(page);
>> unlock_page(page);
>
> Obviously the page is locked when we call split_huge_page(). There's
> an assert inside it. And the lock bit is _supposed_ to be transferred
> to the head page of the page which is being split. My guess is that
> this is messed up somehow; we're perhaps transferring the lock bit to
> the wrong page?
The issue is that the change to split_huge_page() makes split_huge_page_to_list_to_order()
unlocks the wrong subpage. split_huge_page() used to pass the “page” pointer
to split_huge_page_to_list_to_order(), which keeps that “page” still locked.
But this patch changes the “page” passed into split_huge_page_to_list_to_order()
always to the head page.
This fixes the crash on my x86 VM, but it can be improved:
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7c50aeed0522..eff5d2fb5d4e 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -320,10 +320,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins);
int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
unsigned int new_order);
int split_folio_to_list(struct folio *folio, struct list_head *list);
-static inline int split_huge_page(struct page *page)
-{
- return split_folio(page_folio(page));
-}
+int split_huge_page(struct page *page);
void deferred_split_folio(struct folio *folio);
void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c29af9451d92..4d723dab4336 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3297,6 +3297,25 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
return ret;
}
+int split_huge_page(struct page *page)
+{
+ unsigned int min_order = 0;
+ struct folio *folio = page_folio(page);
+
+ if (folio_test_anon(folio))
+ goto out;
+
+ if (!folio->mapping) {
+ if (folio_test_pmd_mappable(folio))
+ count_vm_event(THP_SPLIT_PAGE_FAILED);
+ return -EBUSY;
+ }
+
+ min_order = mapping_min_folio_order(folio->mapping);
+out:
+ return split_huge_page_to_list_to_order(page, NULL, min_order);
+}
+
int split_folio_to_list(struct folio *folio, struct list_head *list)
{
unsigned int min_order = 0;
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
2024-08-29 22:12 ` Zi Yan
@ 2024-08-29 23:41 ` Luis Chamberlain
2024-08-30 5:57 ` Sven Schnelle
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Luis Chamberlain @ 2024-08-29 23:41 UTC (permalink / raw)
To: Zi Yan
Cc: Matthew Wilcox, Sven Schnelle, Pankaj Raghav (Samsung), brauner,
akpm, chandan.babu, linux-fsdevel, djwong, hare, gost.dev,
linux-xfs, hch, david, yang, linux-kernel, linux-mm, john.g.garry,
cl, p.raghav, ryan.roberts, David Howells, linux-s390
On Thu, Aug 29, 2024 at 06:12:26PM -0400, Zi Yan wrote:
> The issue is that the change to split_huge_page() makes split_huge_page_to_list_to_order()
> unlocks the wrong subpage. split_huge_page() used to pass the “page” pointer
> to split_huge_page_to_list_to_order(), which keeps that “page” still locked.
> But this patch changes the “page” passed into split_huge_page_to_list_to_order()
> always to the head page.
>
> This fixes the crash on my x86 VM, but it can be improved:
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7c50aeed0522..eff5d2fb5d4e 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -320,10 +320,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins);
> int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> unsigned int new_order);
> int split_folio_to_list(struct folio *folio, struct list_head *list);
> -static inline int split_huge_page(struct page *page)
> -{
> - return split_folio(page_folio(page));
> -}
> +int split_huge_page(struct page *page);
> void deferred_split_folio(struct folio *folio);
>
> void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c29af9451d92..4d723dab4336 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3297,6 +3297,25 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> return ret;
> }
>
> +int split_huge_page(struct page *page)
> +{
> + unsigned int min_order = 0;
> + struct folio *folio = page_folio(page);
> +
> + if (folio_test_anon(folio))
> + goto out;
> +
> + if (!folio->mapping) {
> + if (folio_test_pmd_mappable(folio))
> + count_vm_event(THP_SPLIT_PAGE_FAILED);
> + return -EBUSY;
> + }
> +
> + min_order = mapping_min_folio_order(folio->mapping);
> +out:
> + return split_huge_page_to_list_to_order(page, NULL, min_order);
> +}
> +
> int split_folio_to_list(struct folio *folio, struct list_head *list)
> {
> unsigned int min_order = 0;
Confirmed, and also although you suggest it can be improved, I thought
that we could do that by sharing more code and putting things in the
headers, the below also fixes this but tries to share more code, but
I think it is perhaps less easier to understand than your patch.
So I think your patch is cleaner and easier as a fix.
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index c275aa9cc105..99cd9c7bf55b 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -97,6 +97,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
(!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order)))
#define split_folio(f) split_folio_to_list(f, NULL)
+#define split_folio_to_list(f, list) split_page_folio_to_list(&f->page, f, list)
#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES
#define HPAGE_PMD_SHIFT PMD_SHIFT
@@ -331,10 +332,11 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add
bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins);
int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
unsigned int new_order);
-int split_folio_to_list(struct folio *folio, struct list_head *list);
+int split_page_folio_to_list(struct page *page, struct folio *folio,
+ struct list_head *list);
static inline int split_huge_page(struct page *page)
{
- return split_folio(page_folio(page));
+ return split_page_folio_to_list(page, page_folio(page), NULL);
}
void deferred_split_folio(struct folio *folio);
@@ -511,7 +513,9 @@ static inline int split_huge_page(struct page *page)
return 0;
}
-static inline int split_folio_to_list(struct folio *folio, struct list_head *list)
+static inline int split_page_folio_to_list(struct page *page,
+ struct folio *folio,
+ struct list_head *list)
{
return 0;
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 169f1a71c95d..b115bfe63b52 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3529,7 +3529,8 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
return ret;
}
-int split_folio_to_list(struct folio *folio, struct list_head *list)
+int split_page_folio_to_list(struct page *page, struct folio *folio,
+ struct list_head *list)
{
unsigned int min_order = 0;
@@ -3544,8 +3545,7 @@ int split_folio_to_list(struct folio *folio, struct list_head *list)
min_order = mapping_min_folio_order(folio->mapping);
out:
- return split_huge_page_to_list_to_order(&folio->page, list,
- min_order);
+ return split_huge_page_to_list_to_order(page, list, min_order);
}
void __folio_undo_large_rmappable(struct folio *folio)
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
2024-08-29 23:41 ` Luis Chamberlain
@ 2024-08-30 5:57 ` Sven Schnelle
2024-08-30 11:58 ` Daniel Gomez
2024-08-30 14:59 ` Pankaj Raghav
2 siblings, 0 replies; 12+ messages in thread
From: Sven Schnelle @ 2024-08-30 5:57 UTC (permalink / raw)
To: Luis Chamberlain
Cc: Zi Yan, Matthew Wilcox, Pankaj Raghav (Samsung), brauner, akpm,
chandan.babu, linux-fsdevel, djwong, hare, gost.dev, linux-xfs,
hch, david, yang, linux-kernel, linux-mm, john.g.garry, cl,
p.raghav, ryan.roberts, David Howells, linux-s390
Luis Chamberlain <mcgrof@kernel.org> writes:
> On Thu, Aug 29, 2024 at 06:12:26PM -0400, Zi Yan wrote:
>> The issue is that the change to split_huge_page() makes split_huge_page_to_list_to_order()
>> unlocks the wrong subpage. split_huge_page() used to pass the “page” pointer
>> to split_huge_page_to_list_to_order(), which keeps that “page” still locked.
>> But this patch changes the “page” passed into split_huge_page_to_list_to_order()
>> always to the head page.
>>
>> This fixes the crash on my x86 VM, but it can be improved:
It also fixes the issue on s390x. Thanks!
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
2024-08-29 23:41 ` Luis Chamberlain
2024-08-30 5:57 ` Sven Schnelle
@ 2024-08-30 11:58 ` Daniel Gomez
2024-08-30 14:59 ` Pankaj Raghav
2 siblings, 0 replies; 12+ messages in thread
From: Daniel Gomez @ 2024-08-30 11:58 UTC (permalink / raw)
To: Luis Chamberlain
Cc: Zi Yan, Matthew Wilcox, Sven Schnelle, Pankaj Raghav (Samsung),
brauner, akpm, chandan.babu, linux-fsdevel, djwong, hare,
gost.dev, linux-xfs, hch, david, yang, linux-kernel, linux-mm,
john.g.garry, cl, p.raghav, ryan.roberts, David Howells,
linux-s390
On Thu, Aug 29, 2024 at 04:41:48PM -0700, Luis Chamberlain wrote:
> On Thu, Aug 29, 2024 at 06:12:26PM -0400, Zi Yan wrote:
> > The issue is that the change to split_huge_page() makes split_huge_page_to_list_to_order()
> > unlocks the wrong subpage. split_huge_page() used to pass the “page” pointer
> > to split_huge_page_to_list_to_order(), which keeps that “page” still locked.
> > But this patch changes the “page” passed into split_huge_page_to_list_to_order()
> > always to the head page.
> >
> > This fixes the crash on my x86 VM, but it can be improved:
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 7c50aeed0522..eff5d2fb5d4e 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -320,10 +320,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins);
> > int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> > unsigned int new_order);
> > int split_folio_to_list(struct folio *folio, struct list_head *list);
> > -static inline int split_huge_page(struct page *page)
> > -{
> > - return split_folio(page_folio(page));
> > -}
> > +int split_huge_page(struct page *page);
> > void deferred_split_folio(struct folio *folio);
> >
> > void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index c29af9451d92..4d723dab4336 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -3297,6 +3297,25 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> > return ret;
> > }
> >
> > +int split_huge_page(struct page *page)
> > +{
> > + unsigned int min_order = 0;
> > + struct folio *folio = page_folio(page);
> > +
> > + if (folio_test_anon(folio))
> > + goto out;
> > +
> > + if (!folio->mapping) {
> > + if (folio_test_pmd_mappable(folio))
> > + count_vm_event(THP_SPLIT_PAGE_FAILED);
> > + return -EBUSY;
> > + }
> > +
> > + min_order = mapping_min_folio_order(folio->mapping);
> > +out:
> > + return split_huge_page_to_list_to_order(page, NULL, min_order);
> > +}
> > +
> > int split_folio_to_list(struct folio *folio, struct list_head *list)
> > {
> > unsigned int min_order = 0;
>
>
> Confirmed, and also although you suggest it can be improved, I thought
> that we could do that by sharing more code and putting things in the
> headers, the below also fixes this but tries to share more code, but
> I think it is perhaps less easier to understand than your patch.
>
> So I think your patch is cleaner and easier as a fix.
I reproduced it in arm64 as well. And both fixes provided solved the issue.
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index c275aa9cc105..99cd9c7bf55b 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -97,6 +97,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
> (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order)))
>
> #define split_folio(f) split_folio_to_list(f, NULL)
> +#define split_folio_to_list(f, list) split_page_folio_to_list(&f->page, f, list)
>
> #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES
> #define HPAGE_PMD_SHIFT PMD_SHIFT
> @@ -331,10 +332,11 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add
> bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins);
> int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> unsigned int new_order);
> -int split_folio_to_list(struct folio *folio, struct list_head *list);
> +int split_page_folio_to_list(struct page *page, struct folio *folio,
> + struct list_head *list);
> static inline int split_huge_page(struct page *page)
> {
> - return split_folio(page_folio(page));
> + return split_page_folio_to_list(page, page_folio(page), NULL);
> }
> void deferred_split_folio(struct folio *folio);
>
> @@ -511,7 +513,9 @@ static inline int split_huge_page(struct page *page)
> return 0;
> }
>
> -static inline int split_folio_to_list(struct folio *folio, struct list_head *list)
> +static inline int split_page_folio_to_list(struct page *page,
> + struct folio *folio,
> + struct list_head *list)
> {
> return 0;
> }
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 169f1a71c95d..b115bfe63b52 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3529,7 +3529,8 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> return ret;
> }
>
> -int split_folio_to_list(struct folio *folio, struct list_head *list)
> +int split_page_folio_to_list(struct page *page, struct folio *folio,
> + struct list_head *list)
> {
> unsigned int min_order = 0;
>
> @@ -3544,8 +3545,7 @@ int split_folio_to_list(struct folio *folio, struct list_head *list)
>
> min_order = mapping_min_folio_order(folio->mapping);
> out:
> - return split_huge_page_to_list_to_order(&folio->page, list,
> - min_order);
> + return split_huge_page_to_list_to_order(page, list, min_order);
> }
>
> void __folio_undo_large_rmappable(struct folio *folio)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
2024-08-29 23:41 ` Luis Chamberlain
2024-08-30 5:57 ` Sven Schnelle
2024-08-30 11:58 ` Daniel Gomez
@ 2024-08-30 14:59 ` Pankaj Raghav
2024-08-30 17:12 ` Luis Chamberlain
` (2 more replies)
2 siblings, 3 replies; 12+ messages in thread
From: Pankaj Raghav @ 2024-08-30 14:59 UTC (permalink / raw)
To: Luis Chamberlain, Zi Yan
Cc: Matthew Wilcox, Sven Schnelle, brauner, akpm, chandan.babu,
linux-fsdevel, djwong, hare, gost.dev, linux-xfs, hch, david,
yang, linux-kernel, linux-mm, john.g.garry, cl, p.raghav,
ryan.roberts, David Howells, linux-s390
On 30/08/2024 01:41, Luis Chamberlain wrote:
> On Thu, Aug 29, 2024 at 06:12:26PM -0400, Zi Yan wrote:
>> The issue is that the change to split_huge_page() makes split_huge_page_to_list_to_order()
>> unlocks the wrong subpage. split_huge_page() used to pass the “page” pointer
>> to split_huge_page_to_list_to_order(), which keeps that “page” still locked.
>> But this patch changes the “page” passed into split_huge_page_to_list_to_order()
>> always to the head page.
>>
>> This fixes the crash on my x86 VM, but it can be improved:
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 7c50aeed0522..eff5d2fb5d4e 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -320,10 +320,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins);
>> int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>> unsigned int new_order);
>> int split_folio_to_list(struct folio *folio, struct list_head *list);
>> -static inline int split_huge_page(struct page *page)
>> -{
>> - return split_folio(page_folio(page));
>> -}
>> +int split_huge_page(struct page *page);
>> void deferred_split_folio(struct folio *folio);
>>
>> void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index c29af9451d92..4d723dab4336 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3297,6 +3297,25 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>> return ret;
>> }
>>
>> +int split_huge_page(struct page *page)
>> +{
>> + unsigned int min_order = 0;
>> + struct folio *folio = page_folio(page);
>> +
>> + if (folio_test_anon(folio))
>> + goto out;
>> +
>> + if (!folio->mapping) {
>> + if (folio_test_pmd_mappable(folio))
>> + count_vm_event(THP_SPLIT_PAGE_FAILED);
>> + return -EBUSY;
>> + }
>> +
>> + min_order = mapping_min_folio_order(folio->mapping);
>> +out:
>> + return split_huge_page_to_list_to_order(page, NULL, min_order);
>> +}
>> +
>> int split_folio_to_list(struct folio *folio, struct list_head *list)
>> {
>> unsigned int min_order = 0;
>
>
> Confirmed, and also although you suggest it can be improved, I thought
> that we could do that by sharing more code and putting things in the
> headers, the below also fixes this but tries to share more code, but
> I think it is perhaps less easier to understand than your patch.
>
It feels a bit weird to pass both folio and the page in `split_page_folio_to_list()`.
How about we extract the code that returns the min order so that we don't repeat.
Something like this:
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index c275aa9cc105..d27febd5c639 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -331,10 +331,24 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add
bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins);
int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
unsigned int new_order);
+int min_order_for_split(struct folio *folio);
int split_folio_to_list(struct folio *folio, struct list_head *list);
static inline int split_huge_page(struct page *page)
{
- return split_folio(page_folio(page));
+ struct folio *folio = page_folio(page);
+ int ret = min_order_for_split(folio);
+
+ if (ret)
+ return ret;
+
+ /*
+ * split_huge_page() locks the page before splitting and
+ * expects the same page that has been split to be locked when
+ * returned. split_folio_to_list() cannot be used here because
+ * it converts the page to folio and passes the head page to be
+ * split.
+ */
+ return split_huge_page_to_list_to_order(page, NULL, ret);
}
void deferred_split_folio(struct folio *folio);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 169f1a71c95d..b167e036d01b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3529,12 +3529,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
return ret;
}
-int split_folio_to_list(struct folio *folio, struct list_head *list)
+int min_order_for_split(struct folio *folio)
{
- unsigned int min_order = 0;
-
if (folio_test_anon(folio))
- goto out;
+ return 0;
if (!folio->mapping) {
if (folio_test_pmd_mappable(folio))
@@ -3542,10 +3540,17 @@ int split_folio_to_list(struct folio *folio, struct list_head *list)
return -EBUSY;
}
- min_order = mapping_min_folio_order(folio->mapping);
-out:
- return split_huge_page_to_list_to_order(&folio->page, list,
- min_order);
+ return mapping_min_folio_order(folio->mapping);
+}
+
+int split_folio_to_list(struct folio *folio, struct list_head *list)
+{
+ int ret = min_order_for_split(folio);
+
+ if (ret)
+ return ret;
+
+ return split_huge_page_to_list_to_order(&folio->page, list, ret);
}
void __folio_undo_large_rmappable(struct folio *folio)
> So I think your patch is cleaner and easier as a fix.
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index c275aa9cc105..99cd9c7bf55b 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -97,6 +97,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
> (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order)))
>
> #define split_folio(f) split_folio_to_list(f, NULL)
> +#define split_folio_to_list(f, list) split_page_folio_to_list(&f->page, f, list)
>
> #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES
> #define HPAGE_PMD_SHIFT PMD_SHIFT
> @@ -331,10 +332,11 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add
> bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins);
> int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> unsigned int new_order);
> -int split_folio_to_list(struct folio *folio, struct list_head *list);
> +int split_page_folio_to_list(struct page *page, struct folio *folio,
> + struct list_head *list);
> static inline int split_huge_page(struct page *page)
> {
> - return split_folio(page_folio(page));
> + return split_page_folio_to_list(page, page_folio(page), NULL);
> }
> void deferred_split_folio(struct folio *folio);
>
> @@ -511,7 +513,9 @@ static inline int split_huge_page(struct page *page)
> return 0;
> }
>
> -static inline int split_folio_to_list(struct folio *folio, struct list_head *list)
> +static inline int split_page_folio_to_list(struct page *page,
> + struct folio *folio,
> + struct list_head *list)
> {
> return 0;
> }
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 169f1a71c95d..b115bfe63b52 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3529,7 +3529,8 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> return ret;
> }
>
> -int split_folio_to_list(struct folio *folio, struct list_head *list)
> +int split_page_folio_to_list(struct page *page, struct folio *folio,
> + struct list_head *list)
> {
> unsigned int min_order = 0;
>
> @@ -3544,8 +3545,7 @@ int split_folio_to_list(struct folio *folio, struct list_head *list)
>
> min_order = mapping_min_folio_order(folio->mapping);
> out:
> - return split_huge_page_to_list_to_order(&folio->page, list,
> - min_order);
> + return split_huge_page_to_list_to_order(page, list, min_order);
> }
>
> void __folio_undo_large_rmappable(struct folio *folio)
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
2024-08-30 14:59 ` Pankaj Raghav
@ 2024-08-30 17:12 ` Luis Chamberlain
2024-08-31 22:38 ` Zi Yan
2024-08-30 22:42 ` Matthew Wilcox
2024-08-31 22:35 ` Zi Yan
2 siblings, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2024-08-30 17:12 UTC (permalink / raw)
To: Pankaj Raghav
Cc: Zi Yan, Matthew Wilcox, Sven Schnelle, brauner, akpm,
chandan.babu, linux-fsdevel, djwong, hare, gost.dev, linux-xfs,
hch, david, yang, linux-kernel, linux-mm, john.g.garry, cl,
p.raghav, ryan.roberts, David Howells, linux-s390
On Fri, Aug 30, 2024 at 04:59:57PM +0200, Pankaj Raghav wrote:
> It feels a bit weird to pass both folio and the page in `split_page_folio_to_list()`.
Agreed..
> How about we extract the code that returns the min order so that we don't repeat.
> Something like this:
Of all solutions I like this the most, Zi, do you have any preference?
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
2024-08-30 14:59 ` Pankaj Raghav
2024-08-30 17:12 ` Luis Chamberlain
@ 2024-08-30 22:42 ` Matthew Wilcox
2024-08-31 22:35 ` Zi Yan
2 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2024-08-30 22:42 UTC (permalink / raw)
To: Pankaj Raghav
Cc: Luis Chamberlain, Zi Yan, Sven Schnelle, brauner, akpm,
chandan.babu, linux-fsdevel, djwong, hare, gost.dev, linux-xfs,
hch, david, yang, linux-kernel, linux-mm, john.g.garry, cl,
p.raghav, ryan.roberts, David Howells, linux-s390
On Fri, Aug 30, 2024 at 04:59:57PM +0200, Pankaj Raghav wrote:
> It feels a bit weird to pass both folio and the page in `split_page_folio_to_list()`.
We do that in the rmap code.
But this is not a performance path. We should keep this as simple as
possible.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
2024-08-30 14:59 ` Pankaj Raghav
2024-08-30 17:12 ` Luis Chamberlain
2024-08-30 22:42 ` Matthew Wilcox
@ 2024-08-31 22:35 ` Zi Yan
2 siblings, 0 replies; 12+ messages in thread
From: Zi Yan @ 2024-08-31 22:35 UTC (permalink / raw)
To: Pankaj Raghav
Cc: Luis Chamberlain, Matthew Wilcox, Sven Schnelle, brauner, akpm,
chandan.babu, linux-fsdevel, djwong, hare, gost.dev, linux-xfs,
hch, david, yang, linux-kernel, linux-mm, john.g.garry, cl,
p.raghav, ryan.roberts, David Howells, linux-s390
[-- Attachment #1: Type: text/plain, Size: 5966 bytes --]
On 30 Aug 2024, at 10:59, Pankaj Raghav wrote:
> On 30/08/2024 01:41, Luis Chamberlain wrote:
>> On Thu, Aug 29, 2024 at 06:12:26PM -0400, Zi Yan wrote:
>>> The issue is that the change to split_huge_page() makes split_huge_page_to_list_to_order()
>>> unlocks the wrong subpage. split_huge_page() used to pass the “page” pointer
>>> to split_huge_page_to_list_to_order(), which keeps that “page” still locked.
>>> But this patch changes the “page” passed into split_huge_page_to_list_to_order()
>>> always to the head page.
>>>
>>> This fixes the crash on my x86 VM, but it can be improved:
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 7c50aeed0522..eff5d2fb5d4e 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -320,10 +320,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins);
>>> int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>> unsigned int new_order);
>>> int split_folio_to_list(struct folio *folio, struct list_head *list);
>>> -static inline int split_huge_page(struct page *page)
>>> -{
>>> - return split_folio(page_folio(page));
>>> -}
>>> +int split_huge_page(struct page *page);
>>> void deferred_split_folio(struct folio *folio);
>>>
>>> void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index c29af9451d92..4d723dab4336 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -3297,6 +3297,25 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>> return ret;
>>> }
>>>
>>> +int split_huge_page(struct page *page)
>>> +{
>>> + unsigned int min_order = 0;
>>> + struct folio *folio = page_folio(page);
>>> +
>>> + if (folio_test_anon(folio))
>>> + goto out;
>>> +
>>> + if (!folio->mapping) {
>>> + if (folio_test_pmd_mappable(folio))
>>> + count_vm_event(THP_SPLIT_PAGE_FAILED);
>>> + return -EBUSY;
>>> + }
>>> +
>>> + min_order = mapping_min_folio_order(folio->mapping);
>>> +out:
>>> + return split_huge_page_to_list_to_order(page, NULL, min_order);
>>> +}
>>> +
>>> int split_folio_to_list(struct folio *folio, struct list_head *list)
>>> {
>>> unsigned int min_order = 0;
>>
>>
>> Confirmed, and also although you suggest it can be improved, I thought
>> that we could do that by sharing more code and putting things in the
>> headers, the below also fixes this but tries to share more code, but
>> I think it is perhaps less easier to understand than your patch.
>>
> It feels a bit weird to pass both folio and the page in `split_page_folio_to_list()`.
>
> How about we extract the code that returns the min order so that we don't repeat.
>
> Something like this:
>
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index c275aa9cc105..d27febd5c639 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -331,10 +331,24 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add
> bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins);
> int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> unsigned int new_order);
> +int min_order_for_split(struct folio *folio);
> int split_folio_to_list(struct folio *folio, struct list_head *list);
Since split_folio() is no longer used below, this line can be removed.
> static inline int split_huge_page(struct page *page)
> {
> - return split_folio(page_folio(page));
> + struct folio *folio = page_folio(page);
> + int ret = min_order_for_split(folio);
> +
> + if (ret)
> + return ret;
min_order_for_split() returns -EBUSY, 0, and a positive min_order. This if
statement should be "if (ret < 0)"?
> +
> + /*
> + * split_huge_page() locks the page before splitting and
> + * expects the same page that has been split to be locked when
> + * returned. split_folio_to_list() cannot be used here because
> + * it converts the page to folio and passes the head page to be
> + * split.
> + */
> + return split_huge_page_to_list_to_order(page, NULL, ret);
> }
> void deferred_split_folio(struct folio *folio);
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 169f1a71c95d..b167e036d01b 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3529,12 +3529,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> return ret;
> }
>
> -int split_folio_to_list(struct folio *folio, struct list_head *list)
> +int min_order_for_split(struct folio *folio)
> {
> - unsigned int min_order = 0;
> -
> if (folio_test_anon(folio))
> - goto out;
> + return 0;
>
> if (!folio->mapping) {
> if (folio_test_pmd_mappable(folio))
> @@ -3542,10 +3540,17 @@ int split_folio_to_list(struct folio *folio, struct list_head *list)
> return -EBUSY;
> }
>
> - min_order = mapping_min_folio_order(folio->mapping);
> -out:
> - return split_huge_page_to_list_to_order(&folio->page, list,
> - min_order);
> + return mapping_min_folio_order(folio->mapping);
> +}
> +
> +int split_folio_to_list(struct folio *folio, struct list_head *list)
> +{
> + int ret = min_order_for_split(folio);
> +
> + if (ret)
> + return ret;
Ditto.
> +
> + return split_huge_page_to_list_to_order(&folio->page, list, ret);
> }
>
> void __folio_undo_large_rmappable(struct folio *folio)
>
--
Best Regards,
Yan, Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks
2024-08-30 17:12 ` Luis Chamberlain
@ 2024-08-31 22:38 ` Zi Yan
0 siblings, 0 replies; 12+ messages in thread
From: Zi Yan @ 2024-08-31 22:38 UTC (permalink / raw)
To: Luis Chamberlain
Cc: Pankaj Raghav, Matthew Wilcox, Sven Schnelle, brauner, akpm,
chandan.babu, linux-fsdevel, djwong, hare, gost.dev, linux-xfs,
hch, david, yang, linux-kernel, linux-mm, john.g.garry, cl,
p.raghav, ryan.roberts, David Howells, linux-s390
[-- Attachment #1: Type: text/plain, Size: 523 bytes --]
On 30 Aug 2024, at 13:12, Luis Chamberlain wrote:
> On Fri, Aug 30, 2024 at 04:59:57PM +0200, Pankaj Raghav wrote:
>> It feels a bit weird to pass both folio and the page in `split_page_folio_to_list()`.
>
> Agreed..
>
>> How about we extract the code that returns the min order so that we don't repeat.
>> Something like this:
>
> Of all solutions I like this the most, Zi, do you have any preference?
No preference, as long as Pankaj fixed the "if (ret)" issue I mentioned in the
other email.
--
Best Regards,
Yan, Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-08-31 22:39 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240822135018.1931258-1-kernel@pankajraghav.com>
[not found] ` <20240822135018.1931258-5-kernel@pankajraghav.com>
2024-08-29 10:51 ` [PATCH v13 04/10] mm: split a folio in minimum folio order chunks Sven Schnelle
2024-08-29 18:46 ` Luis Chamberlain
2024-08-29 19:55 ` Matthew Wilcox
2024-08-29 22:12 ` Zi Yan
2024-08-29 23:41 ` Luis Chamberlain
2024-08-30 5:57 ` Sven Schnelle
2024-08-30 11:58 ` Daniel Gomez
2024-08-30 14:59 ` Pankaj Raghav
2024-08-30 17:12 ` Luis Chamberlain
2024-08-31 22:38 ` Zi Yan
2024-08-30 22:42 ` Matthew Wilcox
2024-08-31 22:35 ` Zi Yan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox