linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
@ 2023-08-21 23:48 Zach O'Keefe
  2023-08-22  5:20 ` [EXTERNAL] " Saurabh Singh Sengar
  2023-08-24  7:39 ` David Hildenbrand
  0 siblings, 2 replies; 16+ messages in thread
From: Zach O'Keefe @ 2023-08-21 23:48 UTC (permalink / raw)
  To: linux-mm, Yang Shi; +Cc: linux-kernel, Zach O'Keefe, Saurabh Singh Sengar

The 6.0 commits:

commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")

merged "can we have THPs in this VMA?" logic that was previously done
separately by fault-path, khugepaged, and smaps "THPeligible" checks.

During the process, the semantics of the fault path check changed in two
ways:

1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
   handler that could satisfy the fault.  Previously, this check had been
   done in create_huge_pud() and create_huge_pmd() routines, but after
   the changes, we never reach those routines.

During the review of the above commits, it was determined that in-tree
users weren't affected by the change; most notably, since the only relevant
user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
explicitly approved early in approval logic.  However, there is at least
one occurrence where an out-of-tree driver that used
VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.

Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
any ->huge_fault handler a chance to handle the fault.  Note that we
don't validate the file mode or mapping alignment, which is consistent
with the behavior before the aforementioned commits.

Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Yang Shi <shy828301@gmail.com>
---
Changed from v2[1]:
	- Fixed false negative in smaps check when !dax && ->huge_fault
Changed from v1[2]:
	- [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists

There are some logical holes in smaps' THPeligible checks here, but those
are best dealt with in follow-up patches.  For now, just make sure the
fault path is dealt with.

[1] https://lore.kernel.org/linux-mm/20230818211533.2523697-1-zokeefe@google.com/
[2] https://lore.kernel.org/linux-mm/CAAa6QmQw+F=o6htOn=6ADD6mwvMO=Ow_67f3ifBv3GpXx9Xg_g@mail.gmail.com/

---
 mm/huge_memory.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index eb3678360b97..901dcf8db8d2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
 		return in_pf;
 
 	/*
-	 * Special VMA and hugetlb VMA.
+	 * khugepaged special VMA and hugetlb VMA.
 	 * Must be checked after dax since some dax mappings may have
 	 * VM_MIXEDMAP set.
 	 */
-	if (vm_flags & VM_NO_KHUGEPAGED)
+	if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))
 		return false;
 
 	/*
@@ -128,12 +128,18 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
 					   !hugepage_flags_always())))
 		return false;
 
-	/* Only regular file is valid */
-	if (!in_pf && file_thp_enabled(vma))
-		return true;
-
-	if (!vma_is_anonymous(vma))
+	if (!vma_is_anonymous(vma)) {
+		/*
+		 * Trust that ->huge_fault() handlers know what they are doing
+		 * in fault path.
+		 */
+		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
+			return true;
+		/* Only regular file is valid in collapse path */
+		if (((!in_pf || smaps)) && file_thp_enabled(vma))
+			return true;
 		return false;
+	}
 
 	if (vma_is_temporary_stack(vma))
 		return false;
-- 
2.42.0.rc1.204.g551eb34607-goog



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* RE: [EXTERNAL] [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-08-21 23:48 [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" Zach O'Keefe
@ 2023-08-22  5:20 ` Saurabh Singh Sengar
  2023-08-24  7:39 ` David Hildenbrand
  1 sibling, 0 replies; 16+ messages in thread
From: Saurabh Singh Sengar @ 2023-08-22  5:20 UTC (permalink / raw)
  To: Zach O'Keefe, linux-mm@kvack.org, Yang Shi
  Cc: linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Zach O'Keefe <zokeefe@google.com>
> Sent: Tuesday, August 22, 2023 5:19 AM
> To: linux-mm@kvack.org; Yang Shi <shy828301@gmail.com>
> Cc: linux-kernel@vger.kernel.org; Zach O'Keefe <zokeefe@google.com>;
> Saurabh Singh Sengar <ssengar@microsoft.com>
> Subject: [EXTERNAL] [PATCH v3] mm/thp: fix "mm: thp: kill
> __transhuge_page_enabled()"
>
> [You don't often get email from zokeefe@google.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> The 6.0 commits:
>
> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()") commit
> 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
>
> merged "can we have THPs in this VMA?" logic that was previously done
> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
>
> During the process, the semantics of the fault path check changed in two
> ways:
>
> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps
> path).
> 2) We no longer checked if non-anonymous memory had a vm_ops-
> >huge_fault
>    handler that could satisfy the fault.  Previously, this check had been
>    done in create_huge_pud() and create_huge_pmd() routines, but after
>    the changes, we never reach those routines.
>
> During the review of the above commits, it was determined that in-tree users
> weren't affected by the change; most notably, since the only relevant user (in
> terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is explicitly
> approved early in approval logic.  However, there is at least one occurrence
> where an out-of-tree driver that used VM_HUGEPAGE|VM_MIXEDMAP with a
> vm_ops->huge_fault handler, was broken.
>
> Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> any ->huge_fault handler a chance to handle the fault.  Note that we don't
> validate the file mode or mapping alignment, which is consistent with the
> behavior before the aforementioned commits.
>
> Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> Cc: Yang Shi <shy828301@gmail.com>
> ---
> Changed from v2[1]:
>         - Fixed false negative in smaps check when !dax && ->huge_fault
> Changed from v1[2]:
>         - [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists
>
> There are some logical holes in smaps' THPeligible checks here, but those are
> best dealt with in follow-up patches.  For now, just make sure the fault path is
> dealt with.
>
> [1]
> https://lore.k/
> ernel.org%2Flinux-mm%2F20230818211533.2523697-1-
> zokeefe%40google.com%2F&data=05%7C01%7Cssengar%40microsoft.com%7
> Ce782558e7bce4f9d060608dba2a12b58%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C638282585367952964%7CUnknown%7CTWFpbGZsb3d8e
> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D
> %7C3000%7C%7C%7C&sdata=t%2FwAGlOyKmKp%2FnDPGv9cl2j3h%2F3xVuV
> Y%2BQqeu3A4HHk%3D&reserved=0
> [2]
> https://lore.k/
> ernel.org%2Flinux-
> mm%2FCAAa6QmQw%2BF%3Do6htOn%3D6ADD6mwvMO%3DOw_67f3ifBv3
> GpXx9Xg_g%40mail.gmail.com%2F&data=05%7C01%7Cssengar%40microsoft.
> com%7Ce782558e7bce4f9d060608dba2a12b58%7C72f988bf86f141af91ab2d
> 7cd011db47%7C1%7C0%7C638282585367952964%7CUnknown%7CTWFpbGZ
> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M
> n0%3D%7C3000%7C%7C%7C&sdata=lT6ZqrOBoVIcPbOH%2BHto5pPTmpC6pk
> QMu58gnKG7aLo%3D&reserved=0
>
> ---
>  mm/huge_memory.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c index
> eb3678360b97..901dcf8db8d2 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct
> *vma, unsigned long vm_flags,
>                 return in_pf;
>
>         /*
> -        * Special VMA and hugetlb VMA.
> +        * khugepaged special VMA and hugetlb VMA.
>          * Must be checked after dax since some dax mappings may have
>          * VM_MIXEDMAP set.
>          */
> -       if (vm_flags & VM_NO_KHUGEPAGED)
> +       if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))
>                 return false;
>
>         /*
> @@ -128,12 +128,18 @@ bool hugepage_vma_check(struct vm_area_struct
> *vma, unsigned long vm_flags,
>                                            !hugepage_flags_always())))
>                 return false;
>
> -       /* Only regular file is valid */
> -       if (!in_pf && file_thp_enabled(vma))
> -               return true;
> -
> -       if (!vma_is_anonymous(vma))
> +       if (!vma_is_anonymous(vma)) {
> +               /*
> +                * Trust that ->huge_fault() handlers know what they are doing
> +                * in fault path.
> +                */
> +               if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> +                       return true;

Thanks for the patch. I have tested it, looks good to me.

- Saurabh

> +               /* Only regular file is valid in collapse path */
> +               if (((!in_pf || smaps)) && file_thp_enabled(vma))
> +                       return true;
>                 return false;
> +       }
>
>         if (vma_is_temporary_stack(vma))
>                 return false;
> --
> 2.42.0.rc1.204.g551eb34607-goog



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-08-21 23:48 [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" Zach O'Keefe
  2023-08-22  5:20 ` [EXTERNAL] " Saurabh Singh Sengar
@ 2023-08-24  7:39 ` David Hildenbrand
  2023-08-24 13:59   ` Zach O'Keefe
  1 sibling, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2023-08-24  7:39 UTC (permalink / raw)
  To: Zach O'Keefe, linux-mm, Yang Shi
  Cc: linux-kernel, Saurabh Singh Sengar, Greg KH

On 22.08.23 01:48, Zach O'Keefe wrote:
> The 6.0 commits:
> 
> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> 
> merged "can we have THPs in this VMA?" logic that was previously done
> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
> 
> During the process, the semantics of the fault path check changed in two
> ways:
> 
> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
>     handler that could satisfy the fault.  Previously, this check had been
>     done in create_huge_pud() and create_huge_pmd() routines, but after
>     the changes, we never reach those routines.
> 
> During the review of the above commits, it was determined that in-tree
> users weren't affected by the change; most notably, since the only relevant
> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> explicitly approved early in approval logic.  However, there is at least
> one occurrence where an out-of-tree driver that used
> VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.

... so all we did is break an arbitrary out-of-tree driver? Sorry to 
say, but why should we care?

Is there any in-tree code affected and needs a "Fixes:" ?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-08-24  7:39 ` David Hildenbrand
@ 2023-08-24 13:59   ` Zach O'Keefe
  2023-08-24 14:05     ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: Zach O'Keefe @ 2023-08-24 13:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, Yang Shi, linux-kernel, Saurabh Singh Sengar, Greg KH

On Thu, Aug 24, 2023 at 12:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 22.08.23 01:48, Zach O'Keefe wrote:
> > The 6.0 commits:
> >
> > commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> > commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> >
> > merged "can we have THPs in this VMA?" logic that was previously done
> > separately by fault-path, khugepaged, and smaps "THPeligible" checks.
> >
> > During the process, the semantics of the fault path check changed in two
> > ways:
> >
> > 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> > 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
> >     handler that could satisfy the fault.  Previously, this check had been
> >     done in create_huge_pud() and create_huge_pmd() routines, but after
> >     the changes, we never reach those routines.
> >
> > During the review of the above commits, it was determined that in-tree
> > users weren't affected by the change; most notably, since the only relevant
> > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> > explicitly approved early in approval logic.  However, there is at least
> > one occurrence where an out-of-tree driver that used
> > VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
>
> ... so all we did is break an arbitrary out-of-tree driver? Sorry to
> say, but why should we care?
>
> Is there any in-tree code affected and needs a "Fixes:" ?

The in-tree code was taken care of during the rework .. but I didn't
know about the possibility of a driver hooking in here.

I don't know what the normal policy / stance here is, but I figured
the change was simple enough that it was worth helping out.

For both VM_MIXEDMAP and !DAX ->huge_fault, there is some argument to
be made that they are unnecessarily restrictive anyways.

> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-08-24 13:59   ` Zach O'Keefe
@ 2023-08-24 14:05     ` David Hildenbrand
  2023-08-24 14:47       ` Zach O'Keefe
  0 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2023-08-24 14:05 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: linux-mm, Yang Shi, linux-kernel, Saurabh Singh Sengar, Greg KH

On 24.08.23 15:59, Zach O'Keefe wrote:
> On Thu, Aug 24, 2023 at 12:39 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 22.08.23 01:48, Zach O'Keefe wrote:
>>> The 6.0 commits:
>>>
>>> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
>>> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
>>>
>>> merged "can we have THPs in this VMA?" logic that was previously done
>>> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
>>>
>>> During the process, the semantics of the fault path check changed in two
>>> ways:
>>>
>>> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
>>> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
>>>      handler that could satisfy the fault.  Previously, this check had been
>>>      done in create_huge_pud() and create_huge_pmd() routines, but after
>>>      the changes, we never reach those routines.
>>>
>>> During the review of the above commits, it was determined that in-tree
>>> users weren't affected by the change; most notably, since the only relevant
>>> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
>>> explicitly approved early in approval logic.  However, there is at least
>>> one occurrence where an out-of-tree driver that used
>>> VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
>>
>> ... so all we did is break an arbitrary out-of-tree driver? Sorry to
>> say, but why should we care?
>>
>> Is there any in-tree code affected and needs a "Fixes:" ?
> 
> The in-tree code was taken care of during the rework .. but I didn't
> know about the possibility of a driver hooking in here.

And that's the problem of the driver, no? It's not the job of the kernel 
developers to be aware of each out-of-tree driver to not accidentally 
break something in there.

> 
> I don't know what the normal policy / stance here is, but I figured
> the change was simple enough that it was worth helping out.

If you decide to be out-of-tree, then you have be prepared to only 
support tested kernels and fix your driver when something changes 
upstream -- like upstream developers would do for you when it would be 
in-tree.

So why can't the out-of-tree driver be fixed, similarly to how we would 
have fixed it if it would be in-tree?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-08-24 14:05     ` David Hildenbrand
@ 2023-08-24 14:47       ` Zach O'Keefe
  2023-08-24 15:39         ` [EXTERNAL] " Saurabh Singh Sengar
  0 siblings, 1 reply; 16+ messages in thread
From: Zach O'Keefe @ 2023-08-24 14:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, Yang Shi, linux-kernel, Saurabh Singh Sengar, Greg KH

On Thu, Aug 24, 2023 at 7:05 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 24.08.23 15:59, Zach O'Keefe wrote:
> > On Thu, Aug 24, 2023 at 12:39 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 22.08.23 01:48, Zach O'Keefe wrote:
> >>> The 6.0 commits:
> >>>
> >>> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> >>> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> >>>
> >>> merged "can we have THPs in this VMA?" logic that was previously done
> >>> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
> >>>
> >>> During the process, the semantics of the fault path check changed in two
> >>> ways:
> >>>
> >>> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> >>> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
> >>>      handler that could satisfy the fault.  Previously, this check had been
> >>>      done in create_huge_pud() and create_huge_pmd() routines, but after
> >>>      the changes, we never reach those routines.
> >>>
> >>> During the review of the above commits, it was determined that in-tree
> >>> users weren't affected by the change; most notably, since the only relevant
> >>> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> >>> explicitly approved early in approval logic.  However, there is at least
> >>> one occurrence where an out-of-tree driver that used
> >>> VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
> >>
> >> ... so all we did is break an arbitrary out-of-tree driver? Sorry to
> >> say, but why should we care?
> >>
> >> Is there any in-tree code affected and needs a "Fixes:" ?
> >
> > The in-tree code was taken care of during the rework .. but I didn't
> > know about the possibility of a driver hooking in here.
>
> And that's the problem of the driver, no? It's not the job of the kernel
> developers to be aware of each out-of-tree driver to not accidentally
> break something in there.
>
> >
> > I don't know what the normal policy / stance here is, but I figured
> > the change was simple enough that it was worth helping out.
>
> If you decide to be out-of-tree, then you have be prepared to only
> support tested kernels and fix your driver when something changes
> upstream -- like upstream developers would do for you when it would be
> in-tree.
>
> So why can't the out-of-tree driver be fixed, similarly to how we would
> have fixed it if it would be in-tree?

I don't know much about driver development, but perhaps they are /
need to use a pristine upstream kernel, with their driver as a
loadable kernel module. Saurabh can comment on this, I don't know.

But your point is very valid otherwise.


> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [EXTERNAL] Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-08-24 14:47       ` Zach O'Keefe
@ 2023-08-24 15:39         ` Saurabh Singh Sengar
  2023-08-25  7:59           ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: Saurabh Singh Sengar @ 2023-08-24 15:39 UTC (permalink / raw)
  To: Zach O'Keefe, David Hildenbrand
  Cc: linux-mm@kvack.org, Yang Shi, linux-kernel@vger.kernel.org,
	Greg KH, Saurabh Sengar

> On Thu, Aug 24, 2023 at 7:05 AM David Hildenbrand <david@redhat.com>
> wrote:
> >
> > On 24.08.23 15:59, Zach O'Keefe wrote:
> > > On Thu, Aug 24, 2023 at 12:39 AM David Hildenbrand
> <david@redhat.com> wrote:
> > >>
> > >> On 22.08.23 01:48, Zach O'Keefe wrote:
> > >>> The 6.0 commits:
> > >>>
> > >>> commit 9fec51689ff6 ("mm: thp: kill
> > >>> transparent_hugepage_active()") commit 7da4e2cb8b1f ("mm: thp:
> > >>> kill __transhuge_page_enabled()")
> > >>>
> > >>> merged "can we have THPs in this VMA?" logic that was previously
> > >>> done separately by fault-path, khugepaged, and smaps "THPeligible"
> checks.
> > >>>
> > >>> During the process, the semantics of the fault path check changed
> > >>> in two
> > >>> ways:
> > >>>
> > >>> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps
> path).
> > >>> 2) We no longer checked if non-anonymous memory had a vm_ops-
> >huge_fault
> > >>>      handler that could satisfy the fault.  Previously, this check had been
> > >>>      done in create_huge_pud() and create_huge_pmd() routines, but
> after
> > >>>      the changes, we never reach those routines.
> > >>>
> > >>> During the review of the above commits, it was determined that
> > >>> in-tree users weren't affected by the change; most notably, since
> > >>> the only relevant user (in terms of THP) of VM_MIXEDMAP or
> > >>> ->huge_fault is DAX, which is explicitly approved early in
> > >>> approval logic.  However, there is at least one occurrence where
> > >>> an out-of-tree driver that used VM_HUGEPAGE|VM_MIXEDMAP with a
> vm_ops->huge_fault handler, was broken.
> > >>
> > >> ... so all we did is break an arbitrary out-of-tree driver? Sorry
> > >> to say, but why should we care?
> > >>
> > >> Is there any in-tree code affected and needs a "Fixes:" ?
> > >
> > > The in-tree code was taken care of during the rework .. but I didn't
> > > know about the possibility of a driver hooking in here.
> >
> > And that's the problem of the driver, no? It's not the job of the
> > kernel developers to be aware of each out-of-tree driver to not
> > accidentally break something in there.
> >
> > >
> > > I don't know what the normal policy / stance here is, but I figured
> > > the change was simple enough that it was worth helping out.
> >
> > If you decide to be out-of-tree, then you have be prepared to only
> > support tested kernels and fix your driver when something changes
> > upstream -- like upstream developers would do for you when it would be
> > in-tree.
> >
> > So why can't the out-of-tree driver be fixed, similarly to how we
> > would have fixed it if it would be in-tree?
> 
> I don't know much about driver development, but perhaps they are / need to
> use a pristine upstream kernel, with their driver as a loadable kernel module.
> Saurabh can comment on this, I don't know.

You are correct Zach. The "out-of-tree" driver had been seamlessly operational
on version 5.19, leveraging the kernel's capability to handle huge faults for
non-anonymous vma. However, the transition to kernel version 6.1 inadvertently
led to the removal of this feature. It's important to note that this removal wasn't
intentional, and I am optimistic about the potential restoration of this feature.

Hello David,

Given the context, I am currently exploring potential ways to address the issue
with the "out-of-tree" driver. I have recognized a challenge posed by the kernel's
memory management framework, which now imposes restrictions on huge faults
for non-anonymous vma. My inclination to contribute this driver to the mainline
kernel remains strong. If there's a possibility of engaging in discussions or
collaborative efforts to align this driver with the present framework, I'm fully
committed to the process. Your suggestions would be greatly appreciated, as I
am eager to ensure the compatibility of the "out-of-tree" driver with the kernel's
evolving framework.

- Saurabh

> 
> But your point is very valid otherwise.
> 
> 
> > --
> > Cheers,
> >
> > David / dhildenb
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-08-24 15:39         ` [EXTERNAL] " Saurabh Singh Sengar
@ 2023-08-25  7:59           ` David Hildenbrand
  2023-08-25 12:49             ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2023-08-25  7:59 UTC (permalink / raw)
  To: Saurabh Singh Sengar, Zach O'Keefe
  Cc: linux-mm@kvack.org, Yang Shi, linux-kernel@vger.kernel.org,
	Greg KH, Saurabh Sengar, Matthew Wilcox

On 24.08.23 17:39, Saurabh Singh Sengar wrote:
>> On Thu, Aug 24, 2023 at 7:05 AM David Hildenbrand <david@redhat.com>
>> wrote:
>>>
>>> On 24.08.23 15:59, Zach O'Keefe wrote:
>>>> On Thu, Aug 24, 2023 at 12:39 AM David Hildenbrand
>> <david@redhat.com> wrote:
>>>>>
>>>>> On 22.08.23 01:48, Zach O'Keefe wrote:
>>>>>> The 6.0 commits:
>>>>>>
>>>>>> commit 9fec51689ff6 ("mm: thp: kill
>>>>>> transparent_hugepage_active()") commit 7da4e2cb8b1f ("mm: thp:
>>>>>> kill __transhuge_page_enabled()")
>>>>>>
>>>>>> merged "can we have THPs in this VMA?" logic that was previously
>>>>>> done separately by fault-path, khugepaged, and smaps "THPeligible"
>> checks.
>>>>>>
>>>>>> During the process, the semantics of the fault path check changed
>>>>>> in two
>>>>>> ways:
>>>>>>
>>>>>> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps
>> path).
>>>>>> 2) We no longer checked if non-anonymous memory had a vm_ops-
>>> huge_fault
>>>>>>       handler that could satisfy the fault.  Previously, this check had been
>>>>>>       done in create_huge_pud() and create_huge_pmd() routines, but
>> after
>>>>>>       the changes, we never reach those routines.
>>>>>>
>>>>>> During the review of the above commits, it was determined that
>>>>>> in-tree users weren't affected by the change; most notably, since
>>>>>> the only relevant user (in terms of THP) of VM_MIXEDMAP or
>>>>>> ->huge_fault is DAX, which is explicitly approved early in
>>>>>> approval logic.  However, there is at least one occurrence where
>>>>>> an out-of-tree driver that used VM_HUGEPAGE|VM_MIXEDMAP with a
>> vm_ops->huge_fault handler, was broken.
>>>>>
>>>>> ... so all we did is break an arbitrary out-of-tree driver? Sorry
>>>>> to say, but why should we care?
>>>>>
>>>>> Is there any in-tree code affected and needs a "Fixes:" ?
>>>>
>>>> The in-tree code was taken care of during the rework .. but I didn't
>>>> know about the possibility of a driver hooking in here.
>>>
>>> And that's the problem of the driver, no? It's not the job of the
>>> kernel developers to be aware of each out-of-tree driver to not
>>> accidentally break something in there.
>>>
>>>>
>>>> I don't know what the normal policy / stance here is, but I figured
>>>> the change was simple enough that it was worth helping out.
>>>
>>> If you decide to be out-of-tree, then you have be prepared to only
>>> support tested kernels and fix your driver when something changes
>>> upstream -- like upstream developers would do for you when it would be
>>> in-tree.
>>>
>>> So why can't the out-of-tree driver be fixed, similarly to how we
>>> would have fixed it if it would be in-tree?
>>
>> I don't know much about driver development, but perhaps they are / need to
>> use a pristine upstream kernel, with their driver as a loadable kernel module.
>> Saurabh can comment on this, I don't know.
> 

Hi!

> You are correct Zach. The "out-of-tree" driver had been seamlessly operational
> on version 5.19, leveraging the kernel's capability to handle huge faults for
> non-anonymous vma. However, the transition to kernel version 6.1 inadvertently
> led to the removal of this feature. It's important to note that this removal wasn't
> intentional, and I am optimistic about the potential restoration of this feature.

We are currently creating 6.5, and are being told that a patch in 6.0 
(released almost one year ago!) broke an out-of-tree driver.

Being that back-level, you cannot possibly expect that the upstream 
community can seriously care about not breaking your OOT driver in each 
release.

Especially, we do have bigger ->huge_fault changes coming up:

https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org

If the driver is not in the tree, people don't care.

You really should try upstreaming that driver.


So this patch here adds complexity (which I don't like) in order to keep 
an OOT driver working -- possibly for a short time. I'm tempted to say 
"please fix your driver to not use huge faults in that scenario, it is 
no longer supported".

But I'm just about to vanish for 1.5 week into vacation :)

@Willy, what are your thoughts?

In any case, I think we should drop the "Fixes" tag. This does not fix 
any kernel BUG -- it restores compatibility with an OOT driver -- and 
already confused people building distributions and asking about this fix ;)


> 
> Hello David,
> 
> Given the context, I am currently exploring potential ways to address the issue
> with the "out-of-tree" driver. I have recognized a challenge posed by the kernel's
> memory management framework, which now imposes restrictions on huge faults
> for non-anonymous vma.

You should try upstreaming your driver possibly without huge fault 
support, and then separately try re-adding huge fault support and see if 
kernel people want to support that or not.

And if your driver *really* requires huge faults, then supporting that 
would be part of your series when upstreaming that driver.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-08-25  7:59           ` David Hildenbrand
@ 2023-08-25 12:49             ` Matthew Wilcox
  2023-08-25 12:58               ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2023-08-25 12:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Saurabh Singh Sengar, Zach O'Keefe, linux-mm@kvack.org,
	Yang Shi, linux-kernel@vger.kernel.org, Greg KH, Saurabh Sengar

On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> Especially, we do have bigger ->huge_fault changes coming up:
> 
> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
> 
> If the driver is not in the tree, people don't care.
> 
> You really should try upstreaming that driver.
> 
> 
> So this patch here adds complexity (which I don't like) in order to keep an
> OOT driver working -- possibly for a short time. I'm tempted to say "please
> fix your driver to not use huge faults in that scenario, it is no longer
> supported".
> 
> But I'm just about to vanish for 1.5 week into vacation :)
> 
> @Willy, what are your thoughts?

Fundamentally there was a bad assumption with the original patch --
it assumed that the only reason to support ->huge_fault was for DAX,
and that's not true.  It's just that the only drivers in-tree which
support ->huge_fault do so in order to support DAX.

Keeping a driver out of tree is always a risky and costly proposition.
It will continue to be broken by core kernel changes, particularly
if/when it does unusual things.

I think the complexity is entirely on us.  I think there's a simpler way
to handle the problem, but I'd start by turning all of this "admin and
app get to control when THP are used" nonsense into no-ops.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-08-25 12:49             ` Matthew Wilcox
@ 2023-08-25 12:58               ` David Hildenbrand
  2023-08-25 15:09                 ` Zach O'Keefe
  0 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2023-08-25 12:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Saurabh Singh Sengar, Zach O'Keefe, linux-mm@kvack.org,
	Yang Shi, linux-kernel@vger.kernel.org, Greg KH, Saurabh Sengar

On 25.08.23 14:49, Matthew Wilcox wrote:
> On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
>> Especially, we do have bigger ->huge_fault changes coming up:
>>
>> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
>>
>> If the driver is not in the tree, people don't care.
>>
>> You really should try upstreaming that driver.
>>
>>
>> So this patch here adds complexity (which I don't like) in order to keep an
>> OOT driver working -- possibly for a short time. I'm tempted to say "please
>> fix your driver to not use huge faults in that scenario, it is no longer
>> supported".
>>
>> But I'm just about to vanish for 1.5 week into vacation :)
>>
>> @Willy, what are your thoughts?
> 
> Fundamentally there was a bad assumption with the original patch --
> it assumed that the only reason to support ->huge_fault was for DAX,
> and that's not true.  It's just that the only drivers in-tree which
> support ->huge_fault do so in order to support DAX.

Okay, and we are willing to continue supporting that then and it's 
nothing we want to stop OOT drivers from doing.

Fine with me; we should probably reflect that in the patch description.

> 
> Keeping a driver out of tree is always a risky and costly proposition.
> It will continue to be broken by core kernel changes, particularly
> if/when it does unusual things.
> 

Yes.

> I think the complexity is entirely on us.  I think there's a simpler way
> to handle the problem, but I'd start by turning all of this "admin and
> app get to control when THP are used" nonsense into no-ops.

Well, simpler, yes, but also more controversial :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-08-25 12:58               ` David Hildenbrand
@ 2023-08-25 15:09                 ` Zach O'Keefe
  2023-09-06  6:58                   ` Saurabh Singh Sengar
  0 siblings, 1 reply; 16+ messages in thread
From: Zach O'Keefe @ 2023-08-25 15:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Matthew Wilcox, Saurabh Singh Sengar, linux-mm@kvack.org,
	Yang Shi, linux-kernel@vger.kernel.org, Greg KH, Saurabh Sengar

On Fri, Aug 25, 2023 at 5:58 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 25.08.23 14:49, Matthew Wilcox wrote:
> > On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> >> Especially, we do have bigger ->huge_fault changes coming up:
> >>
> >> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org

FWIW, one of those patches updates the docs to read,

"->huge_fault() is called when there is no PUD or PMD entry present.  This
gives the filesystem the opportunity to install a PUD or PMD sized page.
Filesystems can also use the ->fault method to return a PMD sized page,
so implementing this function may not be necessary.  In particular,
filesystems should not call filemap_fault() from ->huge_fault(). [..]"

Which won't work (in the general case) without this patch (well, at
least the ->huge_fault() check part).

So, if we're advertising this is the way it works, maybe that gives a
stronger argument for addressing it sooner vs when the first in-tree
user depends on it?

> >> If the driver is not in the tree, people don't care.
> >>
> >> You really should try upstreaming that driver.
> >>
> >>
> >> So this patch here adds complexity (which I don't like) in order to keep an
> >> OOT driver working -- possibly for a short time. I'm tempted to say "please
> >> fix your driver to not use huge faults in that scenario, it is no longer
> >> supported".
> >>
> >> But I'm just about to vanish for 1.5 week into vacation :)
> >>
> >> @Willy, what are your thoughts?
> >
> > Fundamentally there was a bad assumption with the original patch --
> > it assumed that the only reason to support ->huge_fault was for DAX,
> > and that's not true.  It's just that the only drivers in-tree which
> > support ->huge_fault do so in order to support DAX.
>
> Okay, and we are willing to continue supporting that then and it's
> nothing we want to stop OOT drivers from doing.
>
> Fine with me; we should probably reflect that in the patch description.

I can change these paragraphs,

"During the review of the above commits, it was determined that in-tree
users weren't affected by the change; most notably, since the only relevant
user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
explicitly approved early in approval logic.  However, there is at least
one occurrence where an out-of-tree driver that used
VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.

Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
any ->huge_fault handler a chance to handle the fault.  Note that we
don't validate the file mode or mapping alignment, which is consistent
with the behavior before the aforementioned commits."

To read,

"The above commits, however, overfit the existing in-tree use cases,
and assume that
the only reason to support ->huge_fault was for DAX (which is
explicitly approved early in the approval logic).
This is a bad assumption to make and unnecessarily prevents general
support of ->huge_fault by filesystems. Allow returning "true" if such
a handler exists, giving the fault path an opportunity to exercise it.

Similarly, the rationale for including the VM_NO_KHUGEPAGED check
along the fault path was that it didn't alter any in-tree users, but
was likewise similarly unnecessarily restrictive (and reads odd).
Remove the check from the fault path."

> >
> > Keeping a driver out of tree is always a risky and costly proposition.
> > It will continue to be broken by core kernel changes, particularly
> > if/when it does unusual things.
> >
>
> Yes.
>
> > I think the complexity is entirely on us.  I think there's a simpler way
> > to handle the problem, but I'd start by turning all of this "admin and
> > app get to control when THP are used" nonsense into no-ops.
>
> Well, simpler, yes, but also more controversial :)
>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-08-25 15:09                 ` Zach O'Keefe
@ 2023-09-06  6:58                   ` Saurabh Singh Sengar
  2023-09-20  5:44                     ` Saurabh Singh Sengar
  0 siblings, 1 reply; 16+ messages in thread
From: Saurabh Singh Sengar @ 2023-09-06  6:58 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: David Hildenbrand, Matthew Wilcox, Saurabh Singh Sengar,
	linux-mm@kvack.org, Yang Shi, linux-kernel@vger.kernel.org,
	Greg KH

On Fri, Aug 25, 2023 at 08:09:07AM -0700, Zach O'Keefe wrote:
> On Fri, Aug 25, 2023 at 5:58 AM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 25.08.23 14:49, Matthew Wilcox wrote:
> > > On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> > >> Especially, we do have bigger ->huge_fault changes coming up:
> > >>
> > >> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
> 
> FWIW, one of those patches updates the docs to read,
> 
> "->huge_fault() is called when there is no PUD or PMD entry present.  This
> gives the filesystem the opportunity to install a PUD or PMD sized page.
> Filesystems can also use the ->fault method to return a PMD sized page,
> so implementing this function may not be necessary.  In particular,
> filesystems should not call filemap_fault() from ->huge_fault(). [..]"
> 
> Which won't work (in the general case) without this patch (well, at
> least the ->huge_fault() check part).
> 
> So, if we're advertising this is the way it works, maybe that gives a
> stronger argument for addressing it sooner vs when the first in-tree
> user depends on it?
> 
> > >> If the driver is not in the tree, people don't care.
> > >>
> > >> You really should try upstreaming that driver.
> > >>
> > >>
> > >> So this patch here adds complexity (which I don't like) in order to keep an
> > >> OOT driver working -- possibly for a short time. I'm tempted to say "please
> > >> fix your driver to not use huge faults in that scenario, it is no longer
> > >> supported".
> > >>
> > >> But I'm just about to vanish for 1.5 week into vacation :)
> > >>
> > >> @Willy, what are your thoughts?
> > >
> > > Fundamentally there was a bad assumption with the original patch --
> > > it assumed that the only reason to support ->huge_fault was for DAX,
> > > and that's not true.  It's just that the only drivers in-tree which
> > > support ->huge_fault do so in order to support DAX.
> >
> > Okay, and we are willing to continue supporting that then and it's
> > nothing we want to stop OOT drivers from doing.
> >
> > Fine with me; we should probably reflect that in the patch description.
> 
> I can change these paragraphs,
> 
> "During the review of the above commits, it was determined that in-tree
> users weren't affected by the change; most notably, since the only relevant
> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> explicitly approved early in approval logic.  However, there is at least
> one occurrence where an out-of-tree driver that used
> VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
> 
> Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> any ->huge_fault handler a chance to handle the fault.  Note that we
> don't validate the file mode or mapping alignment, which is consistent
> with the behavior before the aforementioned commits."
> 
> To read,
> 
> "The above commits, however, overfit the existing in-tree use cases,
> and assume that
> the only reason to support ->huge_fault was for DAX (which is
> explicitly approved early in the approval logic).
> This is a bad assumption to make and unnecessarily prevents general
> support of ->huge_fault by filesystems. Allow returning "true" if such
> a handler exists, giving the fault path an opportunity to exercise it.
> 
> Similarly, the rationale for including the VM_NO_KHUGEPAGED check
> along the fault path was that it didn't alter any in-tree users, but
> was likewise similarly unnecessarily restrictive (and reads odd).
> Remove the check from the fault path."
>


Any chance this can make it to 6.6 kernel ?

- Saurabh 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-09-06  6:58                   ` Saurabh Singh Sengar
@ 2023-09-20  5:44                     ` Saurabh Singh Sengar
  2023-09-22 16:54                       ` Yang Shi
  0 siblings, 1 reply; 16+ messages in thread
From: Saurabh Singh Sengar @ 2023-09-20  5:44 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: David Hildenbrand, Matthew Wilcox, Saurabh Singh Sengar,
	linux-mm@kvack.org, Yang Shi, linux-kernel@vger.kernel.org,
	Greg KH

On Tue, Sep 05, 2023 at 11:58:17PM -0700, Saurabh Singh Sengar wrote:
> On Fri, Aug 25, 2023 at 08:09:07AM -0700, Zach O'Keefe wrote:
> > On Fri, Aug 25, 2023 at 5:58 AM David Hildenbrand <david@redhat.com> wrote:
> > >
> > > On 25.08.23 14:49, Matthew Wilcox wrote:
> > > > On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> > > >> Especially, we do have bigger ->huge_fault changes coming up:
> > > >>
> > > >> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
> > 
> > FWIW, one of those patches updates the docs to read,
> > 
> > "->huge_fault() is called when there is no PUD or PMD entry present.  This
> > gives the filesystem the opportunity to install a PUD or PMD sized page.
> > Filesystems can also use the ->fault method to return a PMD sized page,
> > so implementing this function may not be necessary.  In particular,
> > filesystems should not call filemap_fault() from ->huge_fault(). [..]"
> > 
> > Which won't work (in the general case) without this patch (well, at
> > least the ->huge_fault() check part).
> > 
> > So, if we're advertising this is the way it works, maybe that gives a
> > stronger argument for addressing it sooner vs when the first in-tree
> > user depends on it?
> > 
> > > >> If the driver is not in the tree, people don't care.
> > > >>
> > > >> You really should try upstreaming that driver.
> > > >>
> > > >>
> > > >> So this patch here adds complexity (which I don't like) in order to keep an
> > > >> OOT driver working -- possibly for a short time. I'm tempted to say "please
> > > >> fix your driver to not use huge faults in that scenario, it is no longer
> > > >> supported".
> > > >>
> > > >> But I'm just about to vanish for 1.5 week into vacation :)
> > > >>
> > > >> @Willy, what are your thoughts?
> > > >
> > > > Fundamentally there was a bad assumption with the original patch --
> > > > it assumed that the only reason to support ->huge_fault was for DAX,
> > > > and that's not true.  It's just that the only drivers in-tree which
> > > > support ->huge_fault do so in order to support DAX.
> > >
> > > Okay, and we are willing to continue supporting that then and it's
> > > nothing we want to stop OOT drivers from doing.
> > >
> > > Fine with me; we should probably reflect that in the patch description.
> > 
> > I can change these paragraphs,
> > 
> > "During the review of the above commits, it was determined that in-tree
> > users weren't affected by the change; most notably, since the only relevant
> > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> > explicitly approved early in approval logic.  However, there is at least
> > one occurrence where an out-of-tree driver that used
> > VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
> > 
> > Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> > any ->huge_fault handler a chance to handle the fault.  Note that we
> > don't validate the file mode or mapping alignment, which is consistent
> > with the behavior before the aforementioned commits."
> > 
> > To read,
> > 
> > "The above commits, however, overfit the existing in-tree use cases,
> > and assume that
> > the only reason to support ->huge_fault was for DAX (which is
> > explicitly approved early in the approval logic).
> > This is a bad assumption to make and unnecessarily prevents general
> > support of ->huge_fault by filesystems. Allow returning "true" if such
> > a handler exists, giving the fault path an opportunity to exercise it.
> > 
> > Similarly, the rationale for including the VM_NO_KHUGEPAGED check
> > along the fault path was that it didn't alter any in-tree users, but
> > was likewise similarly unnecessarily restrictive (and reads odd).
> > Remove the check from the fault path."
> >
> 
> 
> Any chance this can make it to 6.6 kernel ?

ping

> 
> - Saurabh 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-09-20  5:44                     ` Saurabh Singh Sengar
@ 2023-09-22 16:54                       ` Yang Shi
  2023-09-22 16:56                         ` Zach O'Keefe
  0 siblings, 1 reply; 16+ messages in thread
From: Yang Shi @ 2023-09-22 16:54 UTC (permalink / raw)
  To: Saurabh Singh Sengar, Andrew Morton
  Cc: Zach O'Keefe, David Hildenbrand, Matthew Wilcox,
	Saurabh Singh Sengar, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Greg KH

On Tue, Sep 19, 2023 at 10:44 PM Saurabh Singh Sengar
<ssengar@linux.microsoft.com> wrote:
>
> On Tue, Sep 05, 2023 at 11:58:17PM -0700, Saurabh Singh Sengar wrote:
> > On Fri, Aug 25, 2023 at 08:09:07AM -0700, Zach O'Keefe wrote:
> > > On Fri, Aug 25, 2023 at 5:58 AM David Hildenbrand <david@redhat.com> wrote:
> > > >
> > > > On 25.08.23 14:49, Matthew Wilcox wrote:
> > > > > On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> > > > >> Especially, we do have bigger ->huge_fault changes coming up:
> > > > >>
> > > > >> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
> > >
> > > FWIW, one of those patches updates the docs to read,
> > >
> > > "->huge_fault() is called when there is no PUD or PMD entry present.  This
> > > gives the filesystem the opportunity to install a PUD or PMD sized page.
> > > Filesystems can also use the ->fault method to return a PMD sized page,
> > > so implementing this function may not be necessary.  In particular,
> > > filesystems should not call filemap_fault() from ->huge_fault(). [..]"
> > >
> > > Which won't work (in the general case) without this patch (well, at
> > > least the ->huge_fault() check part).
> > >
> > > So, if we're advertising this is the way it works, maybe that gives a
> > > stronger argument for addressing it sooner vs when the first in-tree
> > > user depends on it?
> > >
> > > > >> If the driver is not in the tree, people don't care.
> > > > >>
> > > > >> You really should try upstreaming that driver.
> > > > >>
> > > > >>
> > > > >> So this patch here adds complexity (which I don't like) in order to keep an
> > > > >> OOT driver working -- possibly for a short time. I'm tempted to say "please
> > > > >> fix your driver to not use huge faults in that scenario, it is no longer
> > > > >> supported".
> > > > >>
> > > > >> But I'm just about to vanish for 1.5 week into vacation :)
> > > > >>
> > > > >> @Willy, what are your thoughts?
> > > > >
> > > > > Fundamentally there was a bad assumption with the original patch --
> > > > > it assumed that the only reason to support ->huge_fault was for DAX,
> > > > > and that's not true.  It's just that the only drivers in-tree which
> > > > > support ->huge_fault do so in order to support DAX.
> > > >
> > > > Okay, and we are willing to continue supporting that then and it's
> > > > nothing we want to stop OOT drivers from doing.
> > > >
> > > > Fine with me; we should probably reflect that in the patch description.
> > >
> > > I can change these paragraphs,
> > >
> > > "During the review of the above commits, it was determined that in-tree
> > > users weren't affected by the change; most notably, since the only relevant
> > > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> > > explicitly approved early in approval logic.  However, there is at least
> > > one occurrence where an out-of-tree driver that used
> > > VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
> > >
> > > Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> > > any ->huge_fault handler a chance to handle the fault.  Note that we
> > > don't validate the file mode or mapping alignment, which is consistent
> > > with the behavior before the aforementioned commits."
> > >
> > > To read,
> > >
> > > "The above commits, however, overfit the existing in-tree use cases,
> > > and assume that
> > > the only reason to support ->huge_fault was for DAX (which is
> > > explicitly approved early in the approval logic).
> > > This is a bad assumption to make and unnecessarily prevents general
> > > support of ->huge_fault by filesystems. Allow returning "true" if such
> > > a handler exists, giving the fault path an opportunity to exercise it.
> > >
> > > Similarly, the rationale for including the VM_NO_KHUGEPAGED check
> > > along the fault path was that it didn't alter any in-tree users, but
> > > was likewise similarly unnecessarily restrictive (and reads odd).
> > > Remove the check from the fault path."
> > >
> >
> >
> > Any chance this can make it to 6.6 kernel ?
>
> ping

I think we tend to merge this patch, but anyway it is Andrew's call.
Included Andrew in this loop.

>
> >
> > - Saurabh


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-09-22 16:54                       ` Yang Shi
@ 2023-09-22 16:56                         ` Zach O'Keefe
  2023-09-22 17:20                           ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Zach O'Keefe @ 2023-09-22 16:56 UTC (permalink / raw)
  To: Yang Shi
  Cc: Saurabh Singh Sengar, Andrew Morton, David Hildenbrand,
	Matthew Wilcox, Saurabh Singh Sengar, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Greg KH

On Fri, Sep 22, 2023 at 9:54 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Sep 19, 2023 at 10:44 PM Saurabh Singh Sengar
> <ssengar@linux.microsoft.com> wrote:
> >
> > On Tue, Sep 05, 2023 at 11:58:17PM -0700, Saurabh Singh Sengar wrote:
> > > On Fri, Aug 25, 2023 at 08:09:07AM -0700, Zach O'Keefe wrote:
> > > > On Fri, Aug 25, 2023 at 5:58 AM David Hildenbrand <david@redhat.com> wrote:
> > > > >
> > > > > On 25.08.23 14:49, Matthew Wilcox wrote:
> > > > > > On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> > > > > >> Especially, we do have bigger ->huge_fault changes coming up:
> > > > > >>
> > > > > >> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
> > > >
> > > > FWIW, one of those patches updates the docs to read,
> > > >
> > > > "->huge_fault() is called when there is no PUD or PMD entry present.  This
> > > > gives the filesystem the opportunity to install a PUD or PMD sized page.
> > > > Filesystems can also use the ->fault method to return a PMD sized page,
> > > > so implementing this function may not be necessary.  In particular,
> > > > filesystems should not call filemap_fault() from ->huge_fault(). [..]"
> > > >
> > > > Which won't work (in the general case) without this patch (well, at
> > > > least the ->huge_fault() check part).
> > > >
> > > > So, if we're advertising this is the way it works, maybe that gives a
> > > > stronger argument for addressing it sooner vs when the first in-tree
> > > > user depends on it?
> > > >
> > > > > >> If the driver is not in the tree, people don't care.
> > > > > >>
> > > > > >> You really should try upstreaming that driver.
> > > > > >>
> > > > > >>
> > > > > >> So this patch here adds complexity (which I don't like) in order to keep an
> > > > > >> OOT driver working -- possibly for a short time. I'm tempted to say "please
> > > > > >> fix your driver to not use huge faults in that scenario, it is no longer
> > > > > >> supported".
> > > > > >>
> > > > > >> But I'm just about to vanish for 1.5 week into vacation :)
> > > > > >>
> > > > > >> @Willy, what are your thoughts?
> > > > > >
> > > > > > Fundamentally there was a bad assumption with the original patch --
> > > > > > it assumed that the only reason to support ->huge_fault was for DAX,
> > > > > > and that's not true.  It's just that the only drivers in-tree which
> > > > > > support ->huge_fault do so in order to support DAX.
> > > > >
> > > > > Okay, and we are willing to continue supporting that then and it's
> > > > > nothing we want to stop OOT drivers from doing.
> > > > >
> > > > > Fine with me; we should probably reflect that in the patch description.
> > > >
> > > > I can change these paragraphs,
> > > >
> > > > "During the review of the above commits, it was determined that in-tree
> > > > users weren't affected by the change; most notably, since the only relevant
> > > > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> > > > explicitly approved early in approval logic.  However, there is at least
> > > > one occurrence where an out-of-tree driver that used
> > > > VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
> > > >
> > > > Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> > > > any ->huge_fault handler a chance to handle the fault.  Note that we
> > > > don't validate the file mode or mapping alignment, which is consistent
> > > > with the behavior before the aforementioned commits."
> > > >
> > > > To read,
> > > >
> > > > "The above commits, however, overfit the existing in-tree use cases,
> > > > and assume that
> > > > the only reason to support ->huge_fault was for DAX (which is
> > > > explicitly approved early in the approval logic).
> > > > This is a bad assumption to make and unnecessarily prevents general
> > > > support of ->huge_fault by filesystems. Allow returning "true" if such
> > > > a handler exists, giving the fault path an opportunity to exercise it.
> > > >
> > > > Similarly, the rationale for including the VM_NO_KHUGEPAGED check
> > > > along the fault path was that it didn't alter any in-tree users, but
> > > > was likewise similarly unnecessarily restrictive (and reads odd).
> > > > Remove the check from the fault path."
> > > >
> > >
> > >
> > > Any chance this can make it to 6.6 kernel ?
> >
> > ping
>
> I think we tend to merge this patch, but anyway it is Andrew's call.
> Included Andrew in this loop.

Sorry for delay -- just back from (another) OOO,

From this back/forth with David/Matthew, seems like we're OK saying,
"this was a mistake", and that we can take the patch (need some form
of Ack or Reviewed-by from them first, to confirm)

> > Fundamentally there was a bad assumption with the original patch --
> > it assumed that the only reason to support ->huge_fault was for DAX,
> > and that's not true.  It's just that the only drivers in-tree which
> > support ->huge_fault do so in order to support DAX.
>
> Okay, and we are willing to continue supporting that then and it's
> nothing we want to stop OOT drivers from doing.
>
> Fine with me; we should probably reflect that in the patch description.

But, I don't know about timing. We are in 6.6-rc2, and this hasn't
been exposed in Andrew's trees yet. 6.6 is looking like it could be a
LTS candidate, in which case this patch could flow backwards from
-stable (which would also land in 6.1-y) .. but I don't know if that
path is suitable for this.

Otherwise, perhaps you could include this fix
when you're ready to upstream your driver?

> >
> > >
> > > - Saurabh


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
  2023-09-22 16:56                         ` Zach O'Keefe
@ 2023-09-22 17:20                           ` Andrew Morton
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2023-09-22 17:20 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Yang Shi, Saurabh Singh Sengar, David Hildenbrand, Matthew Wilcox,
	Saurabh Singh Sengar, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Greg KH

On Fri, 22 Sep 2023 09:56:21 -0700 "Zach O'Keefe" <zokeefe@google.com> wrote:

> >From this back/forth with David/Matthew, seems like we're OK saying,
> "this was a mistake", and that we can take the patch (need some form
> of Ack or Reviewed-by from them first, to confirm)

Yup.  And please let's update the changelog to reflect the details
which have been discussed thus far.

If the change *makes sense* for the current kernel then let's proceed,
regardless of the broken driver issue.

But adding a cc:stable would require extra argumentation, which I will
be interested to read ;)



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-09-22 17:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-21 23:48 [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" Zach O'Keefe
2023-08-22  5:20 ` [EXTERNAL] " Saurabh Singh Sengar
2023-08-24  7:39 ` David Hildenbrand
2023-08-24 13:59   ` Zach O'Keefe
2023-08-24 14:05     ` David Hildenbrand
2023-08-24 14:47       ` Zach O'Keefe
2023-08-24 15:39         ` [EXTERNAL] " Saurabh Singh Sengar
2023-08-25  7:59           ` David Hildenbrand
2023-08-25 12:49             ` Matthew Wilcox
2023-08-25 12:58               ` David Hildenbrand
2023-08-25 15:09                 ` Zach O'Keefe
2023-09-06  6:58                   ` Saurabh Singh Sengar
2023-09-20  5:44                     ` Saurabh Singh Sengar
2023-09-22 16:54                       ` Yang Shi
2023-09-22 16:56                         ` Zach O'Keefe
2023-09-22 17:20                           ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).