[PATCH 6.3.y] mm/hugetlb: revert use of page_cache_next

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 6.3.y] mm/hugetlb: revert use of page_cache_next_miss()
@ 2023-06-06 17:20 Sidhartha Kumar
  2023-06-06 17:38 ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Sidhartha Kumar @ 2023-06-06 17:20 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: songmuchun, mike.kravetz, Sidhartha Kumar, Ackerley Tng

As reported by Ackerley[1], the use of page_cache_next_miss() in
hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
same offset fails with -EEXIST. Revert this change and go back to the
previous method of using get from the page cache and then dropping the
reference on success.

hugetlbfs_pagecache_present() was also refactored to use
page_cache_next_miss(), revert the usage there as well.

User visible impacts include hugetlb fallocate incorrectly returning
EEXIST if pages are already present in the file. In addition, hugetlb
pages will not be included in core dumps if they need to be brought in via
GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
already present in the cache. It may try to allocate a new page and
potentially return ENOMEM as opposed to EEXIST.

Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
Cc: <stable@vger.kernel.org> #v6.3
Reported-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

[1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
---

This revert is the safest way to fix 6.3. The upstream fix will either
fix page_cache_next_miss() itself or use Ackerley's patch to introduce a
new function to check if a page is present in the page cache. Both
directions are currently under review so we can use this safe and simple
fix for 6.3

 fs/hugetlbfs/inode.c |  8 +++-----
 mm/hugetlb.c         | 11 +++++------
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9062da6da5675..586767afb4cdb 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -821,7 +821,6 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		 */
 		struct folio *folio;
 		unsigned long addr;
-		bool present;

 		cond_resched();

@@ -845,10 +844,9 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);

 		/* See if already present in mapping to avoid alloc/free */
-		rcu_read_lock();
-		present = page_cache_next_miss(mapping, index, 1) != index;
-		rcu_read_unlock();
-		if (present) {
+		folio = filemap_get_folio(mapping, idx);
+		if (folio) {
+			folio_put(folio);
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 			hugetlb_drop_vma_policy(&pseudo_vma);
 			continue;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 245038a9fe4ea..29ab27d2a3ef5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5666,13 +5666,12 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
 {
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	pgoff_t idx = vma_hugecache_offset(h, vma, address);
-	bool present;
-
-	rcu_read_lock();
-	present = page_cache_next_miss(mapping, idx, 1) != idx;
-	rcu_read_unlock();
+	struct folio *folio;

-	return present;
+	folio = filemap_get_folio(mapping, idx);
+	if (folio)
+		folio_put(folio);
+	return folio != NULL;
 }

 int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
-- 
2.40.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 6.3.y] mm/hugetlb: revert use of page_cache_next_miss()
  2023-06-06 17:20 Sidhartha Kumar
@ 2023-06-06 17:38 ` Greg KH
  2023-06-06 18:13   ` Sidhartha Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2023-06-06 17:38 UTC (permalink / raw)
  To: Sidhartha Kumar
  Cc: stable, linux-kernel, songmuchun, mike.kravetz, Ackerley Tng

On Tue, Jun 06, 2023 at 10:20:22AM -0700, Sidhartha Kumar wrote:
> As reported by Ackerley[1], the use of page_cache_next_miss() in
> hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
> same offset fails with -EEXIST. Revert this change and go back to the
> previous method of using get from the page cache and then dropping the
> reference on success.
> 
> hugetlbfs_pagecache_present() was also refactored to use
> page_cache_next_miss(), revert the usage there as well.
> 
> User visible impacts include hugetlb fallocate incorrectly returning
> EEXIST if pages are already present in the file. In addition, hugetlb
> pages will not be included in core dumps if they need to be brought in via
> GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
> already present in the cache. It may try to allocate a new page and
> potentially return ENOMEM as opposed to EEXIST.
> 
> Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
> Cc: <stable@vger.kernel.org> #v6.3
> Reported-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
> ---
> 
> This revert is the safest way to fix 6.3. The upstream fix will either
> fix page_cache_next_miss() itself or use Ackerley's patch to introduce a
> new function to check if a page is present in the page cache. Both
> directions are currently under review so we can use this safe and simple
> fix for 6.3

Is there any specific reason why we don't just wait for the fix for
Linus's tree before applying this one, or applying the real fix instead?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 6.3.y] mm/hugetlb: revert use of page_cache_next_miss()
  2023-06-06 17:38 ` Greg KH
@ 2023-06-06 18:13   ` Sidhartha Kumar
  2023-06-07 18:33     ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Sidhartha Kumar @ 2023-06-06 18:13 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, linux-kernel, songmuchun, mike.kravetz, Ackerley Tng

On 6/6/23 10:38 AM, Greg KH wrote:
> On Tue, Jun 06, 2023 at 10:20:22AM -0700, Sidhartha Kumar wrote:
>> As reported by Ackerley[1], the use of page_cache_next_miss() in
>> hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
>> same offset fails with -EEXIST. Revert this change and go back to the
>> previous method of using get from the page cache and then dropping the
>> reference on success.
>>
>> hugetlbfs_pagecache_present() was also refactored to use
>> page_cache_next_miss(), revert the usage there as well.
>>
>> User visible impacts include hugetlb fallocate incorrectly returning
>> EEXIST if pages are already present in the file. In addition, hugetlb
>> pages will not be included in core dumps if they need to be brought in via
>> GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
>> already present in the cache. It may try to allocate a new page and
>> potentially return ENOMEM as opposed to EEXIST.
>>
>> Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
>> Cc: <stable@vger.kernel.org> #v6.3
>> Reported-by: Ackerley Tng <ackerleytng@google.com>
>> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>
>> [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
>> ---
>>
>> This revert is the safest way to fix 6.3. The upstream fix will either
>> fix page_cache_next_miss() itself or use Ackerley's patch to introduce a
>> new function to check if a page is present in the page cache. Both
>> directions are currently under review so we can use this safe and simple
>> fix for 6.3
> 
> Is there any specific reason why we don't just wait for the fix for
> Linus's tree before applying this one, or applying the real fix instead?

I missed Andrew's message stating he would prefer the real fix[1].

Sorry for the noise,
Sidhartha Kumar

[1] 
https://lore.kernel.org/lkml/20230603022209.GA114055@monkey/T/#mea6c8a015dbea5f9c2be88b9791996f4be6c2de8
> 
> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 6.3.y] mm/hugetlb: revert use of page_cache_next_miss()
  2023-06-06 18:13   ` Sidhartha Kumar
@ 2023-06-07 18:33     ` Greg KH
  2023-06-07 20:35       ` Sidhartha Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2023-06-07 18:33 UTC (permalink / raw)
  To: Sidhartha Kumar
  Cc: stable, linux-kernel, songmuchun, mike.kravetz, Ackerley Tng

On Tue, Jun 06, 2023 at 11:13:05AM -0700, Sidhartha Kumar wrote:
> On 6/6/23 10:38 AM, Greg KH wrote:
> > On Tue, Jun 06, 2023 at 10:20:22AM -0700, Sidhartha Kumar wrote:
> > > As reported by Ackerley[1], the use of page_cache_next_miss() in
> > > hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
> > > same offset fails with -EEXIST. Revert this change and go back to the
> > > previous method of using get from the page cache and then dropping the
> > > reference on success.
> > > 
> > > hugetlbfs_pagecache_present() was also refactored to use
> > > page_cache_next_miss(), revert the usage there as well.
> > > 
> > > User visible impacts include hugetlb fallocate incorrectly returning
> > > EEXIST if pages are already present in the file. In addition, hugetlb
> > > pages will not be included in core dumps if they need to be brought in via
> > > GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
> > > already present in the cache. It may try to allocate a new page and
> > > potentially return ENOMEM as opposed to EEXIST.
> > > 
> > > Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
> > > Cc: <stable@vger.kernel.org> #v6.3
> > > Reported-by: Ackerley Tng <ackerleytng@google.com>
> > > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > > 
> > > [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
> > > ---
> > > 
> > > This revert is the safest way to fix 6.3. The upstream fix will either
> > > fix page_cache_next_miss() itself or use Ackerley's patch to introduce a
> > > new function to check if a page is present in the page cache. Both
> > > directions are currently under review so we can use this safe and simple
> > > fix for 6.3
> > 
> > Is there any specific reason why we don't just wait for the fix for
> > Linus's tree before applying this one, or applying the real fix instead?
> 
> I missed Andrew's message stating he would prefer the real fix[1].
> 
> Sorry for the noise,
> Sidhartha Kumar
> 
> [1] https://lore.kernel.org/lkml/20230603022209.GA114055@monkey/T/#mea6c8a015dbea5f9c2be88b9791996f4be6c2de8

Great, is that going to Linus's tree soon?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 6.3.y] mm/hugetlb: revert use of page_cache_next_miss()
  2023-06-07 18:33     ` Greg KH
@ 2023-06-07 20:35       ` Sidhartha Kumar
  0 siblings, 0 replies; 7+ messages in thread
From: Sidhartha Kumar @ 2023-06-07 20:35 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, linux-kernel, songmuchun, mike.kravetz, Ackerley Tng

On 6/7/23 11:33 AM, Greg KH wrote:
> On Tue, Jun 06, 2023 at 11:13:05AM -0700, Sidhartha Kumar wrote:
>> On 6/6/23 10:38 AM, Greg KH wrote:
>>> On Tue, Jun 06, 2023 at 10:20:22AM -0700, Sidhartha Kumar wrote:
>>>> As reported by Ackerley[1], the use of page_cache_next_miss() in
>>>> hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
>>>> same offset fails with -EEXIST. Revert this change and go back to the
>>>> previous method of using get from the page cache and then dropping the
>>>> reference on success.
>>>>
>>>> hugetlbfs_pagecache_present() was also refactored to use
>>>> page_cache_next_miss(), revert the usage there as well.
>>>>
>>>> User visible impacts include hugetlb fallocate incorrectly returning
>>>> EEXIST if pages are already present in the file. In addition, hugetlb
>>>> pages will not be included in core dumps if they need to be brought in via
>>>> GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
>>>> already present in the cache. It may try to allocate a new page and
>>>> potentially return ENOMEM as opposed to EEXIST.
>>>>
>>>> Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
>>>> Cc: <stable@vger.kernel.org> #v6.3
>>>> Reported-by: Ackerley Tng <ackerleytng@google.com>
>>>> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>>>
>>>> [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
>>>> ---
>>>>
>>>> This revert is the safest way to fix 6.3. The upstream fix will either
>>>> fix page_cache_next_miss() itself or use Ackerley's patch to introduce a
>>>> new function to check if a page is present in the page cache. Both
>>>> directions are currently under review so we can use this safe and simple
>>>> fix for 6.3
>>>
>>> Is there any specific reason why we don't just wait for the fix for
>>> Linus's tree before applying this one, or applying the real fix instead?
>>
>> I missed Andrew's message stating he would prefer the real fix[1].
>>
>> Sorry for the noise,
>> Sidhartha Kumar
>>
>> [1] https://lore.kernel.org/lkml/20230603022209.GA114055@monkey/T/#mea6c8a015dbea5f9c2be88b9791996f4be6c2de8
> 
> Great, is that going to Linus's tree soon?
> 

Andrew just added it to mm-hotfixes-stable so it should be in Linus's 
tree soon.

Thanks,
Sidhartha Kumar

> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 6.3.y] mm/hugetlb: revert use of page_cache_next_miss()
@ 2023-06-29 21:18 Sidhartha Kumar
  2023-07-03 18:31 ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Sidhartha Kumar @ 2023-06-29 21:18 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: songmuchun, mike.kravetz, Sidhartha Kumar, Ackerley Tng

commit fd4aed8d985a3236d0877ff6d0c80ad39d4ce81a upstream

Ackerley Tng reported an issue with hugetlbfs fallocate as noted in the
Closes tag.  The issue showed up after the conversion of hugetlb page
cache lookup code to use page_cache_next_miss.  User visible effects are:

- hugetlbfs fallocate incorrectly returns -EEXIST if pages are presnet
  in the file.
- hugetlb pages will not be included in core dumps if they need to be
  brought in via GUP.
- userfaultfd UFFDIO_COPY will not notice pages already present in the
  cache.  It may try to allocate a new page and potentially return
  ENOMEM as opposed to EEXIST.

Revert the use page_cache_next_miss() in hugetlb code.

The upstream fix[2] cannot be used used directly as the return value for
filemap_get_folio() has been changed between 6.3 and upstream.

Closes: https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com
Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
Cc: <stable@vger.kernel.org> #v6.3
Reported-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>

[1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
[2] https://lore.kernel.org/lkml/20230621230255.GD4155@monkey/
---

 fs/hugetlbfs/inode.c |  8 +++-----
 mm/hugetlb.c         | 11 +++++------
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9062da6da5675..586767afb4cdb 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -821,7 +821,6 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		 */
 		struct folio *folio;
 		unsigned long addr;
-		bool present;
 
 		cond_resched();
 
@@ -845,10 +844,9 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);
 
 		/* See if already present in mapping to avoid alloc/free */
-		rcu_read_lock();
-		present = page_cache_next_miss(mapping, index, 1) != index;
-		rcu_read_unlock();
-		if (present) {
+		folio = filemap_get_folio(mapping, index);
+		if (folio) {
+			folio_put(folio);
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 			hugetlb_drop_vma_policy(&pseudo_vma);
 			continue;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 245038a9fe4ea..29ab27d2a3ef5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5666,13 +5666,12 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
 {
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	pgoff_t idx = vma_hugecache_offset(h, vma, address);
-	bool present;
-
-	rcu_read_lock();
-	present = page_cache_next_miss(mapping, idx, 1) != idx;
-	rcu_read_unlock();
+	struct folio *folio;
 
-	return present;
+	folio = filemap_get_folio(mapping, idx);
+	if (folio)
+		folio_put(folio);
+	return folio != NULL;
 }
 
 int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 6.3.y] mm/hugetlb: revert use of page_cache_next_miss()
  2023-06-29 21:18 [PATCH 6.3.y] mm/hugetlb: revert use of page_cache_next_miss() Sidhartha Kumar
@ 2023-07-03 18:31 ` Greg KH
  0 siblings, 0 replies; 7+ messages in thread
From: Greg KH @ 2023-07-03 18:31 UTC (permalink / raw)
  To: Sidhartha Kumar
  Cc: stable, linux-kernel, songmuchun, mike.kravetz, Ackerley Tng

On Thu, Jun 29, 2023 at 05:18:17PM -0400, Sidhartha Kumar wrote:
> commit fd4aed8d985a3236d0877ff6d0c80ad39d4ce81a upstream
> 
> Ackerley Tng reported an issue with hugetlbfs fallocate as noted in the
> Closes tag.  The issue showed up after the conversion of hugetlb page
> cache lookup code to use page_cache_next_miss.  User visible effects are:
> 
> - hugetlbfs fallocate incorrectly returns -EEXIST if pages are presnet
>   in the file.
> - hugetlb pages will not be included in core dumps if they need to be
>   brought in via GUP.
> - userfaultfd UFFDIO_COPY will not notice pages already present in the
>   cache.  It may try to allocate a new page and potentially return
>   ENOMEM as opposed to EEXIST.
> 
> Revert the use page_cache_next_miss() in hugetlb code.
> 
> The upstream fix[2] cannot be used used directly as the return value for
> filemap_get_folio() has been changed between 6.3 and upstream.
> 
> Closes: https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com
> Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
> Cc: <stable@vger.kernel.org> #v6.3
> Reported-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> 
> [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
> [2] https://lore.kernel.org/lkml/20230621230255.GD4155@monkey/
> ---
> 
>  fs/hugetlbfs/inode.c |  8 +++-----
>  mm/hugetlb.c         | 11 +++++------
>  2 files changed, 8 insertions(+), 11 deletions(-)

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-03 18:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-29 21:18 [PATCH 6.3.y] mm/hugetlb: revert use of page_cache_next_miss() Sidhartha Kumar
2023-07-03 18:31 ` Greg KH
  -- strict thread matches above, loose matches on Subject: below --
2023-06-06 17:20 Sidhartha Kumar
2023-06-06 17:38 ` Greg KH
2023-06-06 18:13   ` Sidhartha Kumar
2023-06-07 18:33     ` Greg KH
2023-06-07 20:35       ` Sidhartha Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).