linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Cc: Matthew Wilcox <willy@infradead.org>,
	Ackerley Tng <ackerleytng@google.com>,
	Sidhartha Kumar <sidhartha.kumar@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	vannapurve@google.com, erdemaktas@google.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one
Date: Fri,  2 Jun 2023 15:57:47 -0700	[thread overview]
Message-ID: <20230602225747.103865-2-mike.kravetz@oracle.com> (raw)
In-Reply-To: <20230602225747.103865-1-mike.kravetz@oracle.com>

Ackerley Tng reported an issue with hugetlbfs fallocate here[1].  The
issue showed up after the conversion of hugetlb page cache lookup code
to use page_cache_next_miss.  Code in hugetlb fallocate, userfaultfd
and GUP is now using page_cache_next_miss to determine if a page is
present the page cache.  The following statement is used.

	present = page_cache_next_miss(mapping, index, 1) != index;

There are two issues with page_cache_next_miss when used in this way.
1) If the passed value for index is equal to the 'wrap-around' value,
   the same index will always be returned.  This wrap-around value is 0,
   so 0 will be returned even if page is present at index 0.
2) If there is no gap in the range passed, the last index in the range
   will be returned.  When passed a range of 1 as above, the passed
   index value will be returned even if the page is present.
The end result is the statement above will NEVER indicate a page is
present in the cache, even if it is.

As noted by Ackerley in [1], users can see this by hugetlb fallocate
incorrectly returning EEXIST if pages are already present in the file.
In addition, hugetlb pages will not be included in core dumps if they
need to be brought in via GUP.  userfaultfd UFFDIO_COPY also uses this
code and will not notice pages already present in the cache.  It may try
to allocate a new page and potentially return ENOMEM as opposed to
EEXIST.

Both page_cache_next_miss and page_cache_prev_miss have similar issues.
Fix by:
- Check for index equal to 'wrap-around' value and do not exit early.
- If no gap is found in range, return index outside range.
- Update function description to say 'wrap-around' value could be
  returned if passed as index.

[1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/

Reported-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/filemap.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 71dc90f64e43..123540c7ba45 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1733,7 +1733,9 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
  *
  * Return: The index of the gap if found, otherwise an index outside the
  * range specified (in which case 'return - index >= max_scan' will be true).
- * In the rare case of index wrap-around, 0 will be returned.
+ * In the rare case of index wrap-around, 0 will be returned.  0 will also
+ * be returned if index == 0 and there is a gap at the index.  We can not
+ * wrap-around if passed index == 0.
  */
 pgoff_t page_cache_next_miss(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan)
@@ -1743,12 +1745,13 @@ pgoff_t page_cache_next_miss(struct address_space *mapping,
 	while (max_scan--) {
 		void *entry = xas_next(&xas);
 		if (!entry || xa_is_value(entry))
-			break;
-		if (xas.xa_index == 0)
-			break;
+			return xas.xa_index;
+		if (xas.xa_index == 0 && index != 0)
+			return xas.xa_index;
 	}
 
-	return xas.xa_index;
+	/* No gaps in range and no wrap-around, return index beyond range */
+	return xas.xa_index + 1;
 }
 EXPORT_SYMBOL(page_cache_next_miss);
 
@@ -1769,7 +1772,9 @@ EXPORT_SYMBOL(page_cache_next_miss);
  *
  * Return: The index of the gap if found, otherwise an index outside the
  * range specified (in which case 'index - return >= max_scan' will be true).
- * In the rare case of wrap-around, ULONG_MAX will be returned.
+ * In the rare case of wrap-around, ULONG_MAX will be returned.  ULONG_MAX
+ * will also be returned if index == ULONG_MAX and there is a gap at the
+ * index.  We can not wrap-around if passed index == ULONG_MAX.
  */
 pgoff_t page_cache_prev_miss(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan)
@@ -1779,12 +1784,13 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
 	while (max_scan--) {
 		void *entry = xas_prev(&xas);
 		if (!entry || xa_is_value(entry))
-			break;
-		if (xas.xa_index == ULONG_MAX)
-			break;
+			return xas.xa_index;
+		if (xas.xa_index == ULONG_MAX && index != ULONG_MAX)
+			return xas.xa_index;
 	}
 
-	return xas.xa_index;
+	/* No gaps in range and no wrap-around, return index beyond range */
+	return xas.xa_index - 1;
 }
 EXPORT_SYMBOL(page_cache_prev_miss);
 
-- 
2.40.1


  reply	other threads:[~2023-06-02 22:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 22:57 [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error Mike Kravetz
2023-06-02 22:57 ` Mike Kravetz [this message]
2023-06-03  0:59   ` [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one Andrew Morton
2023-06-03  2:24     ` Mike Kravetz
2023-06-05 17:26   ` Ackerley Tng
2023-06-06 22:41     ` Mike Kravetz
2023-06-06 23:35       ` Ackerley Tng
2023-06-03  0:55 ` [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error Andrew Morton
2023-06-03  2:22   ` Mike Kravetz
  -- strict thread matches above, loose matches on Subject: below --
2023-05-04 23:38 [PATCH 0/1] " Mike Kravetz
2023-05-04 23:38 ` [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230602225747.103865-2-mike.kravetz@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=erdemaktas@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sidhartha.kumar@oracle.com \
    --cc=songmuchun@bytedance.com \
    --cc=vannapurve@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).