* [RFC PATCH 0/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests @ 2025-06-23 18:43 Gerald Schaefer 2025-06-23 18:43 ` [RFC PATCH 1/1] " Gerald Schaefer ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Gerald Schaefer @ 2025-06-23 18:43 UTC (permalink / raw) To: Andrew Morton, Anshuman Khandual Cc: Matthew Wilcox, David Hildenbrand, LKML, linux-mm Hi, currently working on enabling THP_SWAP and THP_MIGRATION support for s390, and stumbling over the WARN_ON(args->fixed_pmd_pfn != pmd_pfn(pmd)) in debug_vm_pgtable pmd_swap_tests(). The problem is that pmd_pfn() on s390 will use different shift values for leaf (large) and non-leaf PMDs. And when used on swapped PMDs, for which pmd_leaf() will always return false because !pmd_present(), the result is not really well defined. I think that pmd_pfn() is not safe or ever meant to be called on swapped PMD entries, and it doesn't seem to be used in that way anywhere else but debug_vm_pgtable. Also, the whole logic to test the various swap helpers on normal PTE/PMD entries seems wrong to me. It just works by chance, because e.g. __pmd_to_swp_entry() and __swp_entry_to_pmd() are just no-ops on other architectures (also on s390, but only for PTEs), and also pmd_pfn() does not have any dependency on leaf/non-leaf entries there. So, I started with a small patch to make pmd_swap_tests() use a proper swapped PMD entry as input value, similar to how it is already done in pte_swap_exclusive_tests(), and not use pmd_pfn() for compare but rather compare the whole entries, again similar to pte_swap_exclusive_tests(). But then I noticed that such a change would probably also make sense for the other swap tests, and also a small inconsistency in Documentation, where it says e.g. __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE I think this is wrong, those helpers should never operate on present and mapped PTEs, and they certainly don't create any swapped entry from a mapped entry, given that they are just no-ops on most architectures. Instead, in this example, it just returns the arch-dependent representation of a swp_entry_t, which happens to be just the entry itself on most architectures. See also pte_to_swp_entry() / swp_entry_to_pte() in include/linux/swapops.h. Now it became a larger clean-up, and I hope it makes sense. This is all rather new common code for me, so maybe I got things wrong, feedback is welcome. Gerald Schaefer (1): mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests Documentation/mm/arch_pgtable_helpers.rst | 8 ++-- mm/debug_vm_pgtable.c | 55 ++++++++++++++--------- 2 files changed, 38 insertions(+), 25 deletions(-) -- 2.48.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC PATCH 1/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests 2025-06-23 18:43 [RFC PATCH 0/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests Gerald Schaefer @ 2025-06-23 18:43 ` Gerald Schaefer 2025-06-23 19:10 ` David Hildenbrand 2025-06-25 4:28 ` Anshuman Khandual 2025-06-23 19:06 ` [RFC PATCH 0/1] " David Hildenbrand 2025-06-24 7:50 ` Anshuman Khandual 2 siblings, 2 replies; 12+ messages in thread From: Gerald Schaefer @ 2025-06-23 18:43 UTC (permalink / raw) To: Andrew Morton, Anshuman Khandual Cc: Matthew Wilcox, David Hildenbrand, LKML, linux-mm The various __pte/pmd_to_swp_entry and __swp_entry_to_pte/pmd helper functions are expected to operate on swapped PTE/PMD entries, not on present and mapped entries. Reflect this in the swap tests by using a swp_entry_t as input value, similar to how it is already done in pte_swap_exclusive_tests(). Move the swap entry creation to init_args() and store it in args, so it can also be used in other functions. The pte/pmd_swap_tests() are also changed to compare entries instead of pfn values, because pte/pmd_pfn() helpers are not expected to operate on swapped entries. E.g. on s390, pmd_pfn() needs different shifts for leaf (large) and non-leaf PMDs. Also update documentation, to reflect that the helpers operate on swapped and not mapped entries, and use correct names, i.e. __swp_to_pte/pmd_entry -> __swp_entry_to_pte/pmd. For consistency, also change pte/pmd_swap_soft_dirty_tests() to use args->swp_entry instead of a present and mapped PTE/PMD. Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> --- Documentation/mm/arch_pgtable_helpers.rst | 8 ++-- mm/debug_vm_pgtable.c | 55 ++++++++++++++--------- 2 files changed, 38 insertions(+), 25 deletions(-) diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst index af245161d8e7..e2ac76202a85 100644 --- a/Documentation/mm/arch_pgtable_helpers.rst +++ b/Documentation/mm/arch_pgtable_helpers.rst @@ -242,13 +242,13 @@ SWAP Page Table Helpers ======================== +---------------------------+--------------------------------------------------+ -| __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE | +| __pte_to_swp_entry | Creates a swap entry (arch) from a swapped PTE | +---------------------------+--------------------------------------------------+ -| __swp_to_pte_entry | Creates a mapped PTE from a swapped entry (arch) | +| __swp_entry_to_pte | Creates a swapped PTE from a swap entry (arch) | +---------------------------+--------------------------------------------------+ -| __pmd_to_swp_entry | Creates a swapped entry (arch) from a mapped PMD | +| __pmd_to_swp_entry | Creates a swap entry (arch) from a swapped PMD | +---------------------------+--------------------------------------------------+ -| __swp_to_pmd_entry | Creates a mapped PMD from a swapped entry (arch) | +| __swp_entry_to_pmd | Creates a swapped PMD from a swap entry (arch) | +---------------------------+--------------------------------------------------+ | is_migration_entry | Tests a migration (read or write) swapped entry | +-------------------------------+----------------------------------------------+ diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 7731b238b534..3b0f83ed6c2e 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -73,6 +73,8 @@ struct pgtable_debug_args { unsigned long fixed_pud_pfn; unsigned long fixed_pmd_pfn; unsigned long fixed_pte_pfn; + + swp_entry_t swp_entry; }; static void __init pte_basic_tests(struct pgtable_debug_args *args, int idx) @@ -754,12 +756,15 @@ static void __init pte_soft_dirty_tests(struct pgtable_debug_args *args) static void __init pte_swap_soft_dirty_tests(struct pgtable_debug_args *args) { - pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot); + pte_t pte; if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) return; pr_debug("Validating PTE swap soft dirty\n"); + pte = swp_entry_to_pte(args->swp_entry); + WARN_ON(!is_swap_pte(pte)); + WARN_ON(!pte_swp_soft_dirty(pte_swp_mksoft_dirty(pte))); WARN_ON(pte_swp_soft_dirty(pte_swp_clear_soft_dirty(pte))); } @@ -793,7 +798,9 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) return; pr_debug("Validating PMD swap soft dirty\n"); - pmd = pfn_pmd(args->fixed_pmd_pfn, args->page_prot); + pmd = swp_entry_to_pmd(args->swp_entry); + WARN_ON(!is_swap_pmd(pmd)); + WARN_ON(!pmd_swp_soft_dirty(pmd_swp_mksoft_dirty(pmd))); WARN_ON(pmd_swp_soft_dirty(pmd_swp_clear_soft_dirty(pmd))); } @@ -804,17 +811,11 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) { static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args) { - unsigned long max_swap_offset; swp_entry_t entry, entry2; pte_t pte; pr_debug("Validating PTE swap exclusive\n"); - - /* See generic_max_swapfile_size(): probe the maximum offset */ - max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL)))); - - /* Create a swp entry with all possible bits set */ - entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset); + entry = args->swp_entry; pte = swp_entry_to_pte(entry); WARN_ON(pte_swp_exclusive(pte)); @@ -838,30 +839,36 @@ static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args) static void __init pte_swap_tests(struct pgtable_debug_args *args) { - swp_entry_t swp; - pte_t pte; + swp_entry_t entry, arch_entry; + pte_t pte, pte2; pr_debug("Validating PTE swap\n"); - pte = pfn_pte(args->fixed_pte_pfn, args->page_prot); - swp = __pte_to_swp_entry(pte); - pte = __swp_entry_to_pte(swp); - WARN_ON(args->fixed_pte_pfn != pte_pfn(pte)); + entry = args->swp_entry; + + pte = swp_entry_to_pte(entry); + WARN_ON(!is_swap_pte(pte)); + arch_entry = __pte_to_swp_entry(pte); + pte2 = __swp_entry_to_pte(arch_entry); + WARN_ON(memcmp(&pte, &pte2, sizeof(pte))); } #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION static void __init pmd_swap_tests(struct pgtable_debug_args *args) { - swp_entry_t swp; - pmd_t pmd; + swp_entry_t entry, arch_entry; + pmd_t pmd, pmd2; if (!has_transparent_hugepage()) return; pr_debug("Validating PMD swap\n"); - pmd = pfn_pmd(args->fixed_pmd_pfn, args->page_prot); - swp = __pmd_to_swp_entry(pmd); - pmd = __swp_entry_to_pmd(swp); - WARN_ON(args->fixed_pmd_pfn != pmd_pfn(pmd)); + entry = args->swp_entry; + pmd = swp_entry_to_pmd(entry); + WARN_ON(!is_swap_pmd(pmd)); + + arch_entry = __pmd_to_swp_entry(pmd); + pmd2 = __swp_entry_to_pmd(arch_entry); + WARN_ON(memcmp(&pmd, &pmd2, sizeof(pmd))); } #else /* !CONFIG_ARCH_ENABLE_THP_MIGRATION */ static void __init pmd_swap_tests(struct pgtable_debug_args *args) { } @@ -1166,6 +1173,7 @@ static void __init init_fixed_pfns(struct pgtable_debug_args *args) static int __init init_args(struct pgtable_debug_args *args) { + unsigned long max_swap_offset; struct page *page = NULL; int ret = 0; @@ -1248,6 +1256,11 @@ static int __init init_args(struct pgtable_debug_args *args) init_fixed_pfns(args); + /* See generic_max_swapfile_size(): probe the maximum offset */ + max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL)))); + /* Create a swp entry with all possible bits set */ + args->swp_entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset); + /* * Allocate (huge) pages because some of the tests need to access * the data in the pages. The corresponding tests will be skipped -- 2.48.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 1/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests 2025-06-23 18:43 ` [RFC PATCH 1/1] " Gerald Schaefer @ 2025-06-23 19:10 ` David Hildenbrand 2025-06-24 10:40 ` Gerald Schaefer 2025-06-25 4:28 ` Anshuman Khandual 1 sibling, 1 reply; 12+ messages in thread From: David Hildenbrand @ 2025-06-23 19:10 UTC (permalink / raw) To: Gerald Schaefer, Andrew Morton, Anshuman Khandual Cc: Matthew Wilcox, LKML, linux-mm On 23.06.25 20:43, Gerald Schaefer wrote: > The various __pte/pmd_to_swp_entry and __swp_entry_to_pte/pmd helper > functions are expected to operate on swapped PTE/PMD entries, not on > present and mapped entries. > > Reflect this in the swap tests by using a swp_entry_t as input value, > similar to how it is already done in pte_swap_exclusive_tests(). > Move the swap entry creation to init_args() and store it in args, so > it can also be used in other functions. > > The pte/pmd_swap_tests() are also changed to compare entries instead of > pfn values, because pte/pmd_pfn() helpers are not expected to operate on > swapped entries. E.g. on s390, pmd_pfn() needs different shifts for leaf > (large) and non-leaf PMDs. > > Also update documentation, to reflect that the helpers operate on > swapped and not mapped entries, and use correct names, i.e. > __swp_to_pte/pmd_entry -> __swp_entry_to_pte/pmd. > > For consistency, also change pte/pmd_swap_soft_dirty_tests() to use > args->swp_entry instead of a present and mapped PTE/PMD. > > Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > --- > Documentation/mm/arch_pgtable_helpers.rst | 8 ++-- > mm/debug_vm_pgtable.c | 55 ++++++++++++++--------- > 2 files changed, 38 insertions(+), 25 deletions(-) > > diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst > index af245161d8e7..e2ac76202a85 100644 > --- a/Documentation/mm/arch_pgtable_helpers.rst > +++ b/Documentation/mm/arch_pgtable_helpers.rst > @@ -242,13 +242,13 @@ SWAP Page Table Helpers > ======================== > > +---------------------------+--------------------------------------------------+ > -| __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE | > +| __pte_to_swp_entry | Creates a swap entry (arch) from a swapped PTE | Maybe something like: "from a swap (!none && !present) PTE" or short "swap PTE". "swapped" might be misleading. Same for the other cases below. > +---------------------------+--------------------------------------------------+ > -| __swp_to_pte_entry | Creates a mapped PTE from a swapped entry (arch) | > +| __swp_entry_to_pte | Creates a swapped PTE from a swap entry (arch) | > +---------------------------+--------------------------------------------------+ > -| __pmd_to_swp_entry | Creates a swapped entry (arch) from a mapped PMD | > +| __pmd_to_swp_entry | Creates a swap entry (arch) from a swapped PMD | > +---------------------------+--------------------------------------------------+ > -| __swp_to_pmd_entry | Creates a mapped PMD from a swapped entry (arch) | > +| __swp_entry_to_pmd | Creates a swapped PMD from a swap entry (arch) | > +---------------------------+--------------------------------------------------+ > | is_migration_entry | Tests a migration (read or write) swapped entry | > +-------------------------------+----------------------------------------------+ > diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c > index 7731b238b534..3b0f83ed6c2e 100644 > --- a/mm/debug_vm_pgtable.c > +++ b/mm/debug_vm_pgtable.c > @@ -73,6 +73,8 @@ struct pgtable_debug_args { > unsigned long fixed_pud_pfn; > unsigned long fixed_pmd_pfn; > unsigned long fixed_pte_pfn; > + > + swp_entry_t swp_entry; > }; > Nothing else jumped at me, so LGTM. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 1/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests 2025-06-23 19:10 ` David Hildenbrand @ 2025-06-24 10:40 ` Gerald Schaefer 0 siblings, 0 replies; 12+ messages in thread From: Gerald Schaefer @ 2025-06-24 10:40 UTC (permalink / raw) To: David Hildenbrand Cc: Andrew Morton, Anshuman Khandual, Matthew Wilcox, LKML, linux-mm, linux-s390 On Mon, 23 Jun 2025 21:10:54 +0200 David Hildenbrand <david@redhat.com> wrote: > On 23.06.25 20:43, Gerald Schaefer wrote: > > The various __pte/pmd_to_swp_entry and __swp_entry_to_pte/pmd helper > > functions are expected to operate on swapped PTE/PMD entries, not on > > present and mapped entries. > > > > Reflect this in the swap tests by using a swp_entry_t as input value, > > similar to how it is already done in pte_swap_exclusive_tests(). > > Move the swap entry creation to init_args() and store it in args, so > > it can also be used in other functions. > > > > The pte/pmd_swap_tests() are also changed to compare entries instead of > > pfn values, because pte/pmd_pfn() helpers are not expected to operate on > > swapped entries. E.g. on s390, pmd_pfn() needs different shifts for leaf > > (large) and non-leaf PMDs. > > > > Also update documentation, to reflect that the helpers operate on > > swapped and not mapped entries, and use correct names, i.e. > > __swp_to_pte/pmd_entry -> __swp_entry_to_pte/pmd. > > > > For consistency, also change pte/pmd_swap_soft_dirty_tests() to use > > args->swp_entry instead of a present and mapped PTE/PMD. > > > > Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > > --- > > Documentation/mm/arch_pgtable_helpers.rst | 8 ++-- > > mm/debug_vm_pgtable.c | 55 ++++++++++++++--------- > > 2 files changed, 38 insertions(+), 25 deletions(-) > > > > diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst > > index af245161d8e7..e2ac76202a85 100644 > > --- a/Documentation/mm/arch_pgtable_helpers.rst > > +++ b/Documentation/mm/arch_pgtable_helpers.rst > > @@ -242,13 +242,13 @@ SWAP Page Table Helpers > > ======================== > > > > +---------------------------+--------------------------------------------------+ > > -| __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE | > > +| __pte_to_swp_entry | Creates a swap entry (arch) from a swapped PTE | > > Maybe something like: > > "from a swap (!none && !present) PTE" > > or short > > "swap PTE". > > "swapped" might be misleading. > > Same for the other cases below. Right, it already felt awkward when I wrote it, not sure why I only changed it for "swapped entry (arch)". I think I like "swap PTE/PMD", naming the actual entries in the page table, vs. "swap entry (arch)", naming the (arch-dependent) representation of the swap PTE/PMD as swp_entry_t. Will change, and also adjust my description, where I also used possibly misleading "swapped". > > > +---------------------------+--------------------------------------------------+ > > -| __swp_to_pte_entry | Creates a mapped PTE from a swapped entry (arch) | > > +| __swp_entry_to_pte | Creates a swapped PTE from a swap entry (arch) | > > +---------------------------+--------------------------------------------------+ > > -| __pmd_to_swp_entry | Creates a swapped entry (arch) from a mapped PMD | > > +| __pmd_to_swp_entry | Creates a swap entry (arch) from a swapped PMD | > > +---------------------------+--------------------------------------------------+ > > -| __swp_to_pmd_entry | Creates a mapped PMD from a swapped entry (arch) | > > +| __swp_entry_to_pmd | Creates a swapped PMD from a swap entry (arch) | > > +---------------------------+--------------------------------------------------+ > > | is_migration_entry | Tests a migration (read or write) swapped entry | > > +-------------------------------+----------------------------------------------+ > > diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c > > index 7731b238b534..3b0f83ed6c2e 100644 > > --- a/mm/debug_vm_pgtable.c > > +++ b/mm/debug_vm_pgtable.c > > @@ -73,6 +73,8 @@ struct pgtable_debug_args { > > unsigned long fixed_pud_pfn; > > unsigned long fixed_pmd_pfn; > > unsigned long fixed_pte_pfn; > > + > > + swp_entry_t swp_entry; > > }; > > > > Nothing else jumped at me, so LGTM. > Thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 1/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests 2025-06-23 18:43 ` [RFC PATCH 1/1] " Gerald Schaefer 2025-06-23 19:10 ` David Hildenbrand @ 2025-06-25 4:28 ` Anshuman Khandual 2025-06-25 16:28 ` Gerald Schaefer 1 sibling, 1 reply; 12+ messages in thread From: Anshuman Khandual @ 2025-06-25 4:28 UTC (permalink / raw) To: Gerald Schaefer, Andrew Morton Cc: Matthew Wilcox, David Hildenbrand, LKML, linux-mm On 24/06/25 12:13 AM, Gerald Schaefer wrote: > The various __pte/pmd_to_swp_entry and __swp_entry_to_pte/pmd helper > functions are expected to operate on swapped PTE/PMD entries, not on > present and mapped entries. > > Reflect this in the swap tests by using a swp_entry_t as input value, > similar to how it is already done in pte_swap_exclusive_tests(). > Move the swap entry creation to init_args() and store it in args, so > it can also be used in other functions. Makes sense. > > The pte/pmd_swap_tests() are also changed to compare entries instead of > pfn values, because pte/pmd_pfn() helpers are not expected to operate on Swap entries compare is also happening now in pte_swap_exclusive_tests(). > swapped entries. E.g. on s390, pmd_pfn() needs different shifts for leaf > (large) and non-leaf PMDs. > > Also update documentation, to reflect that the helpers operate on > swapped and not mapped entries, and use correct names, i.e. > __swp_to_pte/pmd_entry -> __swp_entry_to_pte/pmd. > > For consistency, also change pte/pmd_swap_soft_dirty_tests() to use > args->swp_entry instead of a present and mapped PTE/PMD. Makes sense. > > Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > --- > Documentation/mm/arch_pgtable_helpers.rst | 8 ++-- > mm/debug_vm_pgtable.c | 55 ++++++++++++++--------- > 2 files changed, 38 insertions(+), 25 deletions(-) > > diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst > index af245161d8e7..e2ac76202a85 100644 > --- a/Documentation/mm/arch_pgtable_helpers.rst > +++ b/Documentation/mm/arch_pgtable_helpers.rst > @@ -242,13 +242,13 @@ SWAP Page Table Helpers > ======================== > > +---------------------------+--------------------------------------------------+ > -| __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE | > +| __pte_to_swp_entry | Creates a swap entry (arch) from a swapped PTE | > +---------------------------+--------------------------------------------------+ > -| __swp_to_pte_entry | Creates a mapped PTE from a swapped entry (arch) | > +| __swp_entry_to_pte | Creates a swapped PTE from a swap entry (arch) | > +---------------------------+--------------------------------------------------+ > -| __pmd_to_swp_entry | Creates a swapped entry (arch) from a mapped PMD | > +| __pmd_to_swp_entry | Creates a swap entry (arch) from a swapped PMD | > +---------------------------+--------------------------------------------------+ > -| __swp_to_pmd_entry | Creates a mapped PMD from a swapped entry (arch) | > +| __swp_entry_to_pmd | Creates a swapped PMD from a swap entry (arch) | > +---------------------------+--------------------------------------------------+ > | is_migration_entry | Tests a migration (read or write) swapped entry | > +-------------------------------+----------------------------------------------+ __pte_to_swp_entry() and __pmd_to_swp_entry() are still being used (and tested) even after applying this patch. Should not their entries be preserved ? > diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c > index 7731b238b534..3b0f83ed6c2e 100644 > --- a/mm/debug_vm_pgtable.c > +++ b/mm/debug_vm_pgtable.c > @@ -73,6 +73,8 @@ struct pgtable_debug_args { > unsigned long fixed_pud_pfn; > unsigned long fixed_pmd_pfn; > unsigned long fixed_pte_pfn; > + > + swp_entry_t swp_entry; > }; > > static void __init pte_basic_tests(struct pgtable_debug_args *args, int idx) > @@ -754,12 +756,15 @@ static void __init pte_soft_dirty_tests(struct pgtable_debug_args *args) > > static void __init pte_swap_soft_dirty_tests(struct pgtable_debug_args *args) > { > - pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot); > + pte_t pte; > > if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) > return; > > pr_debug("Validating PTE swap soft dirty\n"); > + pte = swp_entry_to_pte(args->swp_entry); > + WARN_ON(!is_swap_pte(pte)); > + > WARN_ON(!pte_swp_soft_dirty(pte_swp_mksoft_dirty(pte))); > WARN_ON(pte_swp_soft_dirty(pte_swp_clear_soft_dirty(pte))); > } > @@ -793,7 +798,9 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) > return; > > pr_debug("Validating PMD swap soft dirty\n"); > - pmd = pfn_pmd(args->fixed_pmd_pfn, args->page_prot); > + pmd = swp_entry_to_pmd(args->swp_entry); > + WARN_ON(!is_swap_pmd(pmd)); > + > WARN_ON(!pmd_swp_soft_dirty(pmd_swp_mksoft_dirty(pmd))); > WARN_ON(pmd_swp_soft_dirty(pmd_swp_clear_soft_dirty(pmd))); > } > @@ -804,17 +811,11 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) { > > static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args) > { > - unsigned long max_swap_offset; > swp_entry_t entry, entry2; > pte_t pte; > > pr_debug("Validating PTE swap exclusive\n"); > - > - /* See generic_max_swapfile_size(): probe the maximum offset */ > - max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL)))); > - > - /* Create a swp entry with all possible bits set */ > - entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset); > + entry = args->swp_entry; args->swp_entry should be reused here as well. > > pte = swp_entry_to_pte(entry); > WARN_ON(pte_swp_exclusive(pte)); > @@ -838,30 +839,36 @@ static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args) > > static void __init pte_swap_tests(struct pgtable_debug_args *args) > { > - swp_entry_t swp; > - pte_t pte; > + swp_entry_t entry, arch_entry; > + pte_t pte, pte2; A very small nit - s/pte2/pte as the first one is pmd not pte or make it pte1, pte2 if preferred. > > pr_debug("Validating PTE swap\n"); > - pte = pfn_pte(args->fixed_pte_pfn, args->page_prot); > - swp = __pte_to_swp_entry(pte); > - pte = __swp_entry_to_pte(swp); > - WARN_ON(args->fixed_pte_pfn != pte_pfn(pte)); > + entry = args->swp_entry; Should args->swp_entry be used directly here and 'entry' local variable be dropped ? > + > + pte = swp_entry_to_pte(entry); > + WARN_ON(!is_swap_pte(pte)); > + arch_entry = __pte_to_swp_entry(pte); > + pte2 = __swp_entry_to_pte(arch_entry); > + WARN_ON(memcmp(&pte, &pte2, sizeof(pte))); > } > > #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION > static void __init pmd_swap_tests(struct pgtable_debug_args *args) > { > - swp_entry_t swp; > - pmd_t pmd; > + swp_entry_t entry, arch_entry; > + pmd_t pmd, pmd2; A very small nit - s/pmd2/pmd1 as the first one is pmd not pmd1 or just make it pmd1, pmd2 if preferred. > > if (!has_transparent_hugepage()) > return; > > pr_debug("Validating PMD swap\n"); > - pmd = pfn_pmd(args->fixed_pmd_pfn, args->page_prot); > - swp = __pmd_to_swp_entry(pmd); > - pmd = __swp_entry_to_pmd(swp); > - WARN_ON(args->fixed_pmd_pfn != pmd_pfn(pmd)); > + entry = args->swp_entry; > + pmd = swp_entry_to_pmd(entry); > + WARN_ON(!is_swap_pmd(pmd)); Good to have the WARN_ON() to test whether it is a swap PMD or not. > + > + arch_entry = __pmd_to_swp_entry(pmd); > + pmd2 = __swp_entry_to_pmd(arch_entry); > + WARN_ON(memcmp(&pmd, &pmd2, sizeof(pmd))); Sounds good. > } > #else /* !CONFIG_ARCH_ENABLE_THP_MIGRATION */ > static void __init pmd_swap_tests(struct pgtable_debug_args *args) { } > @@ -1166,6 +1173,7 @@ static void __init init_fixed_pfns(struct pgtable_debug_args *args) > > static int __init init_args(struct pgtable_debug_args *args) > { > + unsigned long max_swap_offset; > struct page *page = NULL; > int ret = 0; > > @@ -1248,6 +1256,11 @@ static int __init init_args(struct pgtable_debug_args *args) > > init_fixed_pfns(args); > > + /* See generic_max_swapfile_size(): probe the maximum offset */ > + max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL)))); Why not directly use generic_max_swapfile_size() which is doing exact same thing. unsigned long generic_max_swapfile_size(void) { return swp_offset(pte_to_swp_entry( swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1; } > + /* Create a swp entry with all possible bits set */ > + args->swp_entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset); > + Makes sense to use maximum possible bits while creating the swap entry for testing. > /* > * Allocate (huge) pages because some of the tests need to access > * the data in the pages. The corresponding tests will be skipped ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 1/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests 2025-06-25 4:28 ` Anshuman Khandual @ 2025-06-25 16:28 ` Gerald Schaefer 2025-06-25 16:47 ` David Hildenbrand 0 siblings, 1 reply; 12+ messages in thread From: Gerald Schaefer @ 2025-06-25 16:28 UTC (permalink / raw) To: Anshuman Khandual Cc: Andrew Morton, Matthew Wilcox, David Hildenbrand, LKML, linux-mm, linux-s390 On Wed, 25 Jun 2025 09:58:31 +0530 Anshuman Khandual <anshuman.khandual@arm.com> wrote: [...] > > diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst > > index af245161d8e7..e2ac76202a85 100644 > > --- a/Documentation/mm/arch_pgtable_helpers.rst > > +++ b/Documentation/mm/arch_pgtable_helpers.rst > > @@ -242,13 +242,13 @@ SWAP Page Table Helpers > > ======================== > > > > +---------------------------+--------------------------------------------------+ > > -| __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE | > > +| __pte_to_swp_entry | Creates a swap entry (arch) from a swapped PTE | > > +---------------------------+--------------------------------------------------+ > > -| __swp_to_pte_entry | Creates a mapped PTE from a swapped entry (arch) | > > +| __swp_entry_to_pte | Creates a swapped PTE from a swap entry (arch) | > > +---------------------------+--------------------------------------------------+ > > -| __pmd_to_swp_entry | Creates a swapped entry (arch) from a mapped PMD | > > +| __pmd_to_swp_entry | Creates a swap entry (arch) from a swapped PMD | > > +---------------------------+--------------------------------------------------+ > > -| __swp_to_pmd_entry | Creates a mapped PMD from a swapped entry (arch) | > > +| __swp_entry_to_pmd | Creates a swapped PMD from a swap entry (arch) | > > +---------------------------+--------------------------------------------------+ > > | is_migration_entry | Tests a migration (read or write) swapped entry | > > +-------------------------------+----------------------------------------------+ > > __pte_to_swp_entry() and __pmd_to_swp_entry() are still being used (and tested) > even after applying this patch. Should not their entries be preserved ? Nothing is removed here. Only adjusted description, where David already posted some improvement. And renamed __swp_to_pte/pmd_entry() to the correct names __swp_entry_to_pte/pmd(). [...] > > @@ -804,17 +811,11 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) { > > > > static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args) > > { > > - unsigned long max_swap_offset; > > swp_entry_t entry, entry2; > > pte_t pte; > > > > pr_debug("Validating PTE swap exclusive\n"); > > - > > - /* See generic_max_swapfile_size(): probe the maximum offset */ > > - max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL)))); > > - > > - /* Create a swp entry with all possible bits set */ > > - entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset); > > + entry = args->swp_entry; > args->swp_entry should be reused here as well. Yes, and it is. I just moved the swap entry creation logic from here to init_args(), and instead use args->swp_entry here. > > > > > pte = swp_entry_to_pte(entry); > > WARN_ON(pte_swp_exclusive(pte)); > > @@ -838,30 +839,36 @@ static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args) > > > > static void __init pte_swap_tests(struct pgtable_debug_args *args) > > { > > - swp_entry_t swp; > > - pte_t pte; > > + swp_entry_t entry, arch_entry; > > + pte_t pte, pte2; > A very small nit - s/pte2/pte as the first one is pmd not pte or > make it pte1, pte2 if preferred. Sure, pte1/2 looks better. Same for pmd1/2 in pmd_swap_tests(). > > > > > pr_debug("Validating PTE swap\n"); > > - pte = pfn_pte(args->fixed_pte_pfn, args->page_prot); > > - swp = __pte_to_swp_entry(pte); > > - pte = __swp_entry_to_pte(swp); > > - WARN_ON(args->fixed_pte_pfn != pte_pfn(pte)); > > + entry = args->swp_entry; > > Should args->swp_entry be used directly here and 'entry' local variable > be dropped ? Right, should be possible, also in pmd_swap_tests(). [...] > > @@ -1166,6 +1173,7 @@ static void __init init_fixed_pfns(struct pgtable_debug_args *args) > > > > static int __init init_args(struct pgtable_debug_args *args) > > { > > + unsigned long max_swap_offset; > > struct page *page = NULL; > > int ret = 0; > > > > @@ -1248,6 +1256,11 @@ static int __init init_args(struct pgtable_debug_args *args) > > > > init_fixed_pfns(args); > > > > + /* See generic_max_swapfile_size(): probe the maximum offset */ > > + max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL)))); > Why not directly use generic_max_swapfile_size() which is doing exact same thing. > > unsigned long generic_max_swapfile_size(void) > { > return swp_offset(pte_to_swp_entry( > swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1; > } Good question. I just moved this code here from pte_swap_exclusive_tests(), see above, and did not think about that. Now I also wonder why generic_max_swapfile_size() wasn't used before. But it is not exactly the same thing, there is an extra "+ 1" there. Maybe that is the reason, but I don't really understand the details / difference, and therefore would not want to change it. David, do you remember why you didn't use generic_max_swapfile_size() in your pte_swap_exclusive_tests()? > > > + /* Create a swp entry with all possible bits set */ > > + args->swp_entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset); > > + > > Makes sense to use maximum possible bits while creating the swap entry for testing. > > > /* > > * Allocate (huge) pages because some of the tests need to access > > * the data in the pages. The corresponding tests will be skipped ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 1/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests 2025-06-25 16:28 ` Gerald Schaefer @ 2025-06-25 16:47 ` David Hildenbrand 2025-06-30 4:18 ` Anshuman Khandual 0 siblings, 1 reply; 12+ messages in thread From: David Hildenbrand @ 2025-06-25 16:47 UTC (permalink / raw) To: Gerald Schaefer, Anshuman Khandual Cc: Andrew Morton, Matthew Wilcox, LKML, linux-mm, linux-s390 > [...] >>> @@ -1166,6 +1173,7 @@ static void __init init_fixed_pfns(struct pgtable_debug_args *args) >>> >>> static int __init init_args(struct pgtable_debug_args *args) >>> { >>> + unsigned long max_swap_offset; >>> struct page *page = NULL; >>> int ret = 0; >>> >>> @@ -1248,6 +1256,11 @@ static int __init init_args(struct pgtable_debug_args *args) >>> >>> init_fixed_pfns(args); >>> >>> + /* See generic_max_swapfile_size(): probe the maximum offset */ >>> + max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL)))); >> Why not directly use generic_max_swapfile_size() which is doing exact same thing. >> >> unsigned long generic_max_swapfile_size(void) >> { >> return swp_offset(pte_to_swp_entry( >> swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1; >> } > > Good question. I just moved this code here from pte_swap_exclusive_tests(), > see above, and did not think about that. Now I also wonder why > generic_max_swapfile_size() wasn't used before. > > But it is not exactly the same thing, there is an extra "+ 1" there. > Maybe that is the reason, but I don't really understand the details / > difference, and therefore would not want to change it. > > David, do you remember why you didn't use generic_max_swapfile_size() > in your pte_swap_exclusive_tests()? Excellent question. If only I would remember :) generic_max_swapfile_size() resides in mm/swapfile.c, which is only around with CONFIG_SWAP. It makes sense to have that function only if there are ... actual swapfiles. These checks here are independent of CONFIG_SWAP (at least in theory -- for migration entries etc we don't need CONFIG_SWAP), and we simply want to construct a swap PTE with all possible bits set. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 1/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests 2025-06-25 16:47 ` David Hildenbrand @ 2025-06-30 4:18 ` Anshuman Khandual 2025-06-30 14:38 ` David Hildenbrand 0 siblings, 1 reply; 12+ messages in thread From: Anshuman Khandual @ 2025-06-30 4:18 UTC (permalink / raw) To: David Hildenbrand, Gerald Schaefer Cc: Andrew Morton, Matthew Wilcox, LKML, linux-mm, linux-s390 On 25/06/25 10:17 PM, David Hildenbrand wrote: >> [...] >>>> @@ -1166,6 +1173,7 @@ static void __init init_fixed_pfns(struct pgtable_debug_args *args) >>>> static int __init init_args(struct pgtable_debug_args *args) >>>> { >>>> + unsigned long max_swap_offset; >>>> struct page *page = NULL; >>>> int ret = 0; >>>> @@ -1248,6 +1256,11 @@ static int __init init_args(struct pgtable_debug_args *args) >>>> init_fixed_pfns(args); >>>> + /* See generic_max_swapfile_size(): probe the maximum offset */ >>>> + max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL)))); >>> Why not directly use generic_max_swapfile_size() which is doing exact same thing. >>> >>> unsigned long generic_max_swapfile_size(void) >>> { >>> return swp_offset(pte_to_swp_entry( >>> swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1; >>> } >> >> Good question. I just moved this code here from pte_swap_exclusive_tests(), >> see above, and did not think about that. Now I also wonder why >> generic_max_swapfile_size() wasn't used before. >> >> But it is not exactly the same thing, there is an extra "+ 1" there. >> Maybe that is the reason, but I don't really understand the details / >> difference, and therefore would not want to change it. >> >> David, do you remember why you didn't use generic_max_swapfile_size() >> in your pte_swap_exclusive_tests()? > > Excellent question. If only I would remember :) > > generic_max_swapfile_size() resides in mm/swapfile.c, which is only around with CONFIG_SWAP. > > It makes sense to have that function only if there are ... actual swapfiles. > > These checks here are independent of CONFIG_SWAP (at least in theory -- for migration entries etc we don't need CONFIG_SWAP), and we simply want to construct a swap PTE with all possible bits set. After this modification of PMD based swap test - there will be now two uses for generic_max_swapfile_size(). Rather than refactoring these into a similar helper in mm/debug_vm_pgtable.c - should the existing helper just be moved outside of CONFIG_SWAP, thus making it available in general ? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 1/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests 2025-06-30 4:18 ` Anshuman Khandual @ 2025-06-30 14:38 ` David Hildenbrand 0 siblings, 0 replies; 12+ messages in thread From: David Hildenbrand @ 2025-06-30 14:38 UTC (permalink / raw) To: Anshuman Khandual, Gerald Schaefer Cc: Andrew Morton, Matthew Wilcox, LKML, linux-mm, linux-s390 On 30.06.25 06:18, Anshuman Khandual wrote: > > > On 25/06/25 10:17 PM, David Hildenbrand wrote: >>> [...] >>>>> @@ -1166,6 +1173,7 @@ static void __init init_fixed_pfns(struct pgtable_debug_args *args) >>>>> static int __init init_args(struct pgtable_debug_args *args) >>>>> { >>>>> + unsigned long max_swap_offset; >>>>> struct page *page = NULL; >>>>> int ret = 0; >>>>> @@ -1248,6 +1256,11 @@ static int __init init_args(struct pgtable_debug_args *args) >>>>> init_fixed_pfns(args); >>>>> + /* See generic_max_swapfile_size(): probe the maximum offset */ >>>>> + max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL)))); >>>> Why not directly use generic_max_swapfile_size() which is doing exact same thing. >>>> >>>> unsigned long generic_max_swapfile_size(void) >>>> { >>>> return swp_offset(pte_to_swp_entry( >>>> swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1; >>>> } >>> >>> Good question. I just moved this code here from pte_swap_exclusive_tests(), >>> see above, and did not think about that. Now I also wonder why >>> generic_max_swapfile_size() wasn't used before. >>> >>> But it is not exactly the same thing, there is an extra "+ 1" there. >>> Maybe that is the reason, but I don't really understand the details / >>> difference, and therefore would not want to change it. >>> >>> David, do you remember why you didn't use generic_max_swapfile_size() >>> in your pte_swap_exclusive_tests()? >> >> Excellent question. If only I would remember :) >> >> generic_max_swapfile_size() resides in mm/swapfile.c, which is only around with CONFIG_SWAP. >> >> It makes sense to have that function only if there are ... actual swapfiles. >> >> These checks here are independent of CONFIG_SWAP (at least in theory -- for migration entries etc we don't need CONFIG_SWAP), and we simply want to construct a swap PTE with all possible bits set. > > After this modification of PMD based swap test - there will be now > two uses for generic_max_swapfile_size(). Rather than refactoring > these into a similar helper in mm/debug_vm_pgtable.c - should the > existing helper just be moved outside of CONFIG_SWAP, thus making > it available in general ? As I said, having generic_max_swapfile_size() that returns something != 0 with !CONFIG_SWAP feels rather odd. No swap -> no swapfile -> no swapfile max size. No strong opinion though,, but desperately trying to de-duplicate 2 LOC might not be worth the churn at all. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 0/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests 2025-06-23 18:43 [RFC PATCH 0/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests Gerald Schaefer 2025-06-23 18:43 ` [RFC PATCH 1/1] " Gerald Schaefer @ 2025-06-23 19:06 ` David Hildenbrand 2025-06-24 7:50 ` Anshuman Khandual 2 siblings, 0 replies; 12+ messages in thread From: David Hildenbrand @ 2025-06-23 19:06 UTC (permalink / raw) To: Gerald Schaefer, Andrew Morton, Anshuman Khandual Cc: Matthew Wilcox, LKML, linux-mm On 23.06.25 20:43, Gerald Schaefer wrote: > Hi, > > currently working on enabling THP_SWAP and THP_MIGRATION support for s390, > and stumbling over the WARN_ON(args->fixed_pmd_pfn != pmd_pfn(pmd)) in > debug_vm_pgtable pmd_swap_tests(). The problem is that pmd_pfn() on s390 > will use different shift values for leaf (large) and non-leaf PMDs. And > when used on swapped PMDs, for which pmd_leaf() will always return false > because !pmd_present(), the result is not really well defined. > > I think that pmd_pfn() is not safe or ever meant to be called on swapped > PMD entries, Exactly that. Just like pte_pfn() on a swap entry is bogus. Instead, we can test for is_pfn_swap_entry() and then use swp_offset_pfn/pfn_swap_entry_to_page/pfn_swap_entry_folio. Code in task_mmu.c uses something like swp_entry_t entry = pmd_to_swp_entry(*pmd); if (is_pfn_swap_entry(entry)) page = pfn_swap_entry_to_page(entry); and it doesn't seem to be used in that way anywhere else but > debug_vm_pgtable. Also, the whole logic to test the various swap helpers > on normal PTE/PMD entries seems wrong to me. It just works by chance, > because e.g. __pmd_to_swp_entry() and __swp_entry_to_pmd() are just no-ops > on other architectures (also on s390, but only for PTEs), and also > pmd_pfn() does not have any dependency on leaf/non-leaf entries there. > > So, I started with a small patch to make pmd_swap_tests() use a proper > swapped PMD entry as input value, similar to how it is already done in > pte_swap_exclusive_tests(), and not use pmd_pfn() for compare but rather > compare the whole entries, again similar to pte_swap_exclusive_tests(). > > But then I noticed that such a change would probably also make sense for > the other swap tests, and also a small inconsistency in Documentation, > where it says e.g. > > __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE > > I think this is wrong, those helpers should never operate on present and > mapped PTEs, and they certainly don't create any swapped entry from a > mapped entry, given that they are just no-ops on most architectures. "mapped" is probably misleading. Probably "mapped" as in "this PTE is in the page tables", not "mapped" as in "maps a present page". In any case, it should be clarified. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 0/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests 2025-06-23 18:43 [RFC PATCH 0/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests Gerald Schaefer 2025-06-23 18:43 ` [RFC PATCH 1/1] " Gerald Schaefer 2025-06-23 19:06 ` [RFC PATCH 0/1] " David Hildenbrand @ 2025-06-24 7:50 ` Anshuman Khandual 2025-06-24 10:35 ` Gerald Schaefer 2 siblings, 1 reply; 12+ messages in thread From: Anshuman Khandual @ 2025-06-24 7:50 UTC (permalink / raw) To: Gerald Schaefer, Andrew Morton Cc: Matthew Wilcox, David Hildenbrand, LKML, linux-mm Hello Gerald, On 24/06/25 12:13 AM, Gerald Schaefer wrote: > Hi, > > currently working on enabling THP_SWAP and THP_MIGRATION support for s390, > and stumbling over the WARN_ON(args->fixed_pmd_pfn != pmd_pfn(pmd)) in > debug_vm_pgtable pmd_swap_tests(). The problem is that pmd_pfn() on s390 > will use different shift values for leaf (large) and non-leaf PMDs. And > when used on swapped PMDs, for which pmd_leaf() will always return false > because !pmd_present(), the result is not really well defined. Just curious - pmd_pfn() would have otherwise worked on leaf PMD entries ? Because the PMD swap entries are not leaf entries as pmd_present() returns negative, pmd_pfn() does not work on those ? > > I think that pmd_pfn() is not safe or ever meant to be called on swapped > PMD entries, and it doesn't seem to be used in that way anywhere else but > debug_vm_pgtable. Also, the whole logic to test the various swap helpers But is not the pmd_pfn() called on pmd which is derived from the swap entry first. pmd = pfn_pmd(args->fixed_pmd_pfn, args->page_prot); swp = __pmd_to_swp_entry(pmd); pmd = __swp_entry_to_pmd(swp); WARN_ON(args->fixed_pmd_pfn != pmd_pfn(pmd)); > on normal PTE/PMD entries seems wrong to me. It just works by chance, > because e.g. __pmd_to_swp_entry() and __swp_entry_to_pmd() are just no-ops > on other architectures (also on s390, but only for PTEs), and also Hmm, basically it just tests pfn_pmd() and pmd_pfn() conversions ? > pmd_pfn() does not have any dependency on leaf/non-leaf entries there.Could you please elaborate on that ? > > So, I started with a small patch to make pmd_swap_tests() use a proper > swapped PMD entry as input value, similar to how it is already done in > pte_swap_exclusive_tests(), and not use pmd_pfn() for compare but rather > compare the whole entries, again similar to pte_swap_exclusive_tests(). Agreed, that will make sense as well. > > But then I noticed that such a change would probably also make sense for > the other swap tests, and also a small inconsistency in Documentation, > where it says e.g. > > __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE > > I think this is wrong, those helpers should never operate on present and > mapped PTEs, and they certainly don't create any swapped entry from a > mapped entry, given that they are just no-ops on most architectures. > Instead, in this example, it just returns the arch-dependent > representation of a swp_entry_t, which happens to be just the entry > itself on most architectures. See also pte_to_swp_entry() / > swp_entry_to_pte() in include/linux/swapops.h. Alright. > > Now it became a larger clean-up, and I hope it makes sense. This is all > rather new common code for me, so maybe I got things wrong, feedback is > welcome. A quick ran on arm64 looks just fine, will keep looking into this. > > Gerald Schaefer (1): > mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests > > Documentation/mm/arch_pgtable_helpers.rst | 8 ++-- > mm/debug_vm_pgtable.c | 55 ++++++++++++++--------- > 2 files changed, 38 insertions(+), 25 deletions(-) > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 0/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests 2025-06-24 7:50 ` Anshuman Khandual @ 2025-06-24 10:35 ` Gerald Schaefer 0 siblings, 0 replies; 12+ messages in thread From: Gerald Schaefer @ 2025-06-24 10:35 UTC (permalink / raw) To: Anshuman Khandual Cc: Andrew Morton, Matthew Wilcox, David Hildenbrand, LKML, linux-mm, linux-s390 On Tue, 24 Jun 2025 13:20:42 +0530 Anshuman Khandual <anshuman.khandual@arm.com> wrote: > Hello Gerald, > > On 24/06/25 12:13 AM, Gerald Schaefer wrote: > > Hi, > > > > currently working on enabling THP_SWAP and THP_MIGRATION support for s390, > > and stumbling over the WARN_ON(args->fixed_pmd_pfn != pmd_pfn(pmd)) in > > debug_vm_pgtable pmd_swap_tests(). The problem is that pmd_pfn() on s390 > > will use different shift values for leaf (large) and non-leaf PMDs. And > > when used on swapped PMDs, for which pmd_leaf() will always return false > > because !pmd_present(), the result is not really well defined. > > Just curious - pmd_pfn() would have otherwise worked on leaf PMD entries ? > Because the PMD swap entries are not leaf entries as pmd_present() returns > negative, pmd_pfn() does not work on those ? Yes, but there are actually two problems with this. The initial pmd that is created with pfn_pmd() is already not leaf/large, but present, so pmd_pfn() would already not work correctly on s390. Later, after the __pmd_to_swp_entry() / __swp_entry_to_pmd() cycle, the present bit got removed because of how those helpers will be implemented for s390. Now it is neither large nor present, and pmd_pfn() will be extra confused. IOW, even if we could implement those helpers as simple no-ops similar to other archs, the check would still not work, even though the PMD would have the present bit set, but it still wouldn't be leaf/large. I guess my description was a bit confusing, since the !pmd_present() case would only show on s390, but it is not the only problem here. I think the point is that those helpers should only be used on "proper" swap PTE/PMD entries, which already cannot be present. And of course that pte/pmd_pfn() is not meant to be used on such entries at all, as David explained. > > > > > I think that pmd_pfn() is not safe or ever meant to be called on swapped > > PMD entries, and it doesn't seem to be used in that way anywhere else but > > debug_vm_pgtable. Also, the whole logic to test the various swap helpers > > But is not the pmd_pfn() called on pmd which is derived from the swap entry > first. > > pmd = pfn_pmd(args->fixed_pmd_pfn, args->page_prot); > swp = __pmd_to_swp_entry(pmd); > pmd = __swp_entry_to_pmd(swp); > WARN_ON(args->fixed_pmd_pfn != pmd_pfn(pmd)); Yes, but this logic is not really testing swap entries. It only works because on other archs the __pmd_to_swp_entry() / __swp_entry_to_pmd() are no-ops, and because pmd_pfn() does not care about leaf/large. > > > on normal PTE/PMD entries seems wrong to me. It just works by chance, > > because e.g. __pmd_to_swp_entry() and __swp_entry_to_pmd() are just no-ops > > on other architectures (also on s390, but only for PTEs), and also > > Hmm, basically it just tests pfn_pmd() and pmd_pfn() conversions ? Correct, but with the extra quirk that the initial PMD created by pfn_pmd() is not leaf/large, which is apparently not a problem on other archs for the pmd_pfn() conversion. Actually, I now wonder why pfn_pmd() would not implicitly mark it as leaf/large already, as it seems that this should only be used for leaf PMDs. But maybe there are some special cases where it could also be used for non-leaf PMDs. > > > pmd_pfn() does not have any dependency on leaf/non-leaf entries there. > Could you please elaborate on that ? As explained above, the initial PMD created by pfn_pmd() is not leaf/large. Well, conceptually it is more or less, but it is not marked as such. This would lead to incorrect pmd_pfn() result (only) on s390. > > > > > So, I started with a small patch to make pmd_swap_tests() use a proper > > swapped PMD entry as input value, similar to how it is already done in > > pte_swap_exclusive_tests(), and not use pmd_pfn() for compare but rather > > compare the whole entries, again similar to pte_swap_exclusive_tests(). > > Agreed, that will make sense as well. > > > > > But then I noticed that such a change would probably also make sense for > > the other swap tests, and also a small inconsistency in Documentation, > > where it says e.g. > > > > __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE > > > > I think this is wrong, those helpers should never operate on present and > > mapped PTEs, and they certainly don't create any swapped entry from a > > mapped entry, given that they are just no-ops on most architectures. > > Instead, in this example, it just returns the arch-dependent > > representation of a swp_entry_t, which happens to be just the entry > > itself on most architectures. See also pte_to_swp_entry() / > > swp_entry_to_pte() in include/linux/swapops.h. > > Alright. > > > > > Now it became a larger clean-up, and I hope it makes sense. This is all > > rather new common code for me, so maybe I got things wrong, feedback is > > welcome. > > A quick ran on arm64 looks just fine, will keep looking into this. Thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-06-30 14:38 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-23 18:43 [RFC PATCH 0/1] mm/debug_vm_pgtable: Use a swp_entry_t input value for swap tests Gerald Schaefer 2025-06-23 18:43 ` [RFC PATCH 1/1] " Gerald Schaefer 2025-06-23 19:10 ` David Hildenbrand 2025-06-24 10:40 ` Gerald Schaefer 2025-06-25 4:28 ` Anshuman Khandual 2025-06-25 16:28 ` Gerald Schaefer 2025-06-25 16:47 ` David Hildenbrand 2025-06-30 4:18 ` Anshuman Khandual 2025-06-30 14:38 ` David Hildenbrand 2025-06-23 19:06 ` [RFC PATCH 0/1] " David Hildenbrand 2025-06-24 7:50 ` Anshuman Khandual 2025-06-24 10:35 ` Gerald Schaefer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).