Linux filesystem development
 help / color / mirror / Atom feed
* [PATCH] lib: free pagelist on error in iov_iter_extract_pages()
@ 2026-05-08 11:13 Dmitry Antipov
  2026-05-08 18:33 ` Caleb Sander Mateos
  2026-05-08 21:48 ` Andrew Morton
  0 siblings, 2 replies; 4+ messages in thread
From: Dmitry Antipov @ 2026-05-08 11:13 UTC (permalink / raw)
  To: Jens Axboe, Andrew Morton
  Cc: Christoph Hellwig, linux-block, linux-fsdevel, lvc-project,
	Dmitry Antipov, Fedor Pchelkin

Since 'iov_iter_extract_pages()' may allocate new pagelist if the passed
one isn't large enough, the worst-case scenario may be:

...
struct page *stack_pages[SMALL];
struct page **pages = stack_pages;
...
if (iov_iter_extract_pages(i..., &pages, ...) <= 0) {
        /* Even in case of error, new pagelist may be allocated */
        if (pages != stack_pages)
                kvfree(pages);                                  [1]
        /* The rest of error handling and return */
}
/* Regular flow */
...
if (pages != stack_pages)
        kvfree(pages);
...
return 0;

If you're unlucky so SMALL amount of pages wasn't enough and new
pagelist was allocated, missing [1] causes the memory leak similar
to one I've recently observed and fixed for 6.12 in [2]. So adjust
'iov_iter_extract_pages()' to make such a cleanup itself rather than
rely on caller's handling on error paths, thus making [1] not needed.

[2] https://lore.kernel.org/stable/20260505094529.406783-1-dmantipov@yandex.ru/T/#u

Suggested-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
---
 lib/iov_iter.c | 54 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 243662af1af7..46dd11913df0 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1807,7 +1807,8 @@ static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
  *  (*) Use with ITER_DISCARD is not supported as that has no content.
  *
  * On success, the function sets *@pages to the new pagelist, if allocated, and
- * sets *offset0 to the offset into the first page.
+ * sets *offset0 to the offset into the first page. On error, new pagelist
+ * is freed if was allocated, and *@pages sets back to its original value.
  *
  * It may also return -ENOMEM and -EFAULT.
  */
@@ -1818,31 +1819,42 @@ ssize_t iov_iter_extract_pages(struct iov_iter *i,
 			       iov_iter_extraction_t extraction_flags,
 			       size_t *offset0)
 {
+	struct page **oldpages = *pages;
+	int ret;
+
 	maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT);
 	if (!maxsize)
 		return 0;
 
 	if (likely(user_backed_iter(i)))
-		return iov_iter_extract_user_pages(i, pages, maxsize,
-						   maxpages, extraction_flags,
-						   offset0);
-	if (iov_iter_is_kvec(i))
-		return iov_iter_extract_kvec_pages(i, pages, maxsize,
-						   maxpages, extraction_flags,
-						   offset0);
-	if (iov_iter_is_bvec(i))
-		return iov_iter_extract_bvec_pages(i, pages, maxsize,
-						   maxpages, extraction_flags,
-						   offset0);
-	if (iov_iter_is_folioq(i))
-		return iov_iter_extract_folioq_pages(i, pages, maxsize,
-						     maxpages, extraction_flags,
-						     offset0);
-	if (iov_iter_is_xarray(i))
-		return iov_iter_extract_xarray_pages(i, pages, maxsize,
-						     maxpages, extraction_flags,
-						     offset0);
-	return -EFAULT;
+		ret = iov_iter_extract_user_pages(i, pages, maxsize,
+						  maxpages, extraction_flags,
+						  offset0);
+	else if (iov_iter_is_kvec(i))
+		ret = iov_iter_extract_kvec_pages(i, pages, maxsize,
+						  maxpages, extraction_flags,
+						  offset0);
+	else if (iov_iter_is_bvec(i))
+		ret = iov_iter_extract_bvec_pages(i, pages, maxsize,
+						  maxpages, extraction_flags,
+						  offset0);
+	else if (iov_iter_is_folioq(i))
+		ret = iov_iter_extract_folioq_pages(i, pages, maxsize,
+						    maxpages, extraction_flags,
+						    offset0);
+	else if (iov_iter_is_xarray(i))
+		ret = iov_iter_extract_xarray_pages(i, pages, maxsize,
+						    maxpages, extraction_flags,
+						    offset0);
+	else
+		ret = -EFAULT;
+
+	if (unlikely(ret) && *pages && *pages != oldpages) {
+		kvfree(*pages);
+		*pages = oldpages;
+	}
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iov_iter_extract_pages);
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] lib: free pagelist on error in iov_iter_extract_pages()
  2026-05-08 11:13 [PATCH] lib: free pagelist on error in iov_iter_extract_pages() Dmitry Antipov
@ 2026-05-08 18:33 ` Caleb Sander Mateos
  2026-05-11  6:35   ` Christoph Hellwig
  2026-05-08 21:48 ` Andrew Morton
  1 sibling, 1 reply; 4+ messages in thread
From: Caleb Sander Mateos @ 2026-05-08 18:33 UTC (permalink / raw)
  To: Dmitry Antipov
  Cc: Jens Axboe, Andrew Morton, Christoph Hellwig, linux-block,
	linux-fsdevel, lvc-project, Fedor Pchelkin

On Fri, May 8, 2026 at 4:16 AM Dmitry Antipov <dmantipov@yandex.ru> wrote:
>
> Since 'iov_iter_extract_pages()' may allocate new pagelist if the passed
> one isn't large enough, the worst-case scenario may be:
>
> ...
> struct page *stack_pages[SMALL];
> struct page **pages = stack_pages;
> ...
> if (iov_iter_extract_pages(i..., &pages, ...) <= 0) {
>         /* Even in case of error, new pagelist may be allocated */
>         if (pages != stack_pages)
>                 kvfree(pages);                                  [1]

iov_iter_extract_pages() will only allocate a pages array if the
initial struct page ** passed is NULL (see want_pages_array()). So the
condition pages != stack_pages will never be true. Indeed, it looks
like *all* callers of iov_iter_extract_pages() pass a non-NULL struct
page **. Would it make sense for iov_iter_extract_pages() to require a
pre-allocated pages array and remove support for allocating one?

>         /* The rest of error handling and return */
> }
> /* Regular flow */
> ...
> if (pages != stack_pages)
>         kvfree(pages);
> ...
> return 0;
>
> If you're unlucky so SMALL amount of pages wasn't enough and new
> pagelist was allocated, missing [1] causes the memory leak similar
> to one I've recently observed and fixed for 6.12 in [2]. So adjust
> 'iov_iter_extract_pages()' to make such a cleanup itself rather than
> rely on caller's handling on error paths, thus making [1] not needed.
>
> [2] https://lore.kernel.org/stable/20260505094529.406783-1-dmantipov@yandex.ru/T/#u
>
> Suggested-by: Fedor Pchelkin <pchelkin@ispras.ru>
> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
> ---
>  lib/iov_iter.c | 54 ++++++++++++++++++++++++++++++--------------------
>  1 file changed, 33 insertions(+), 21 deletions(-)
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 243662af1af7..46dd11913df0 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1807,7 +1807,8 @@ static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
>   *  (*) Use with ITER_DISCARD is not supported as that has no content.
>   *
>   * On success, the function sets *@pages to the new pagelist, if allocated, and
> - * sets *offset0 to the offset into the first page.
> + * sets *offset0 to the offset into the first page. On error, new pagelist
> + * is freed if was allocated, and *@pages sets back to its original value.
>   *
>   * It may also return -ENOMEM and -EFAULT.
>   */
> @@ -1818,31 +1819,42 @@ ssize_t iov_iter_extract_pages(struct iov_iter *i,
>                                iov_iter_extraction_t extraction_flags,
>                                size_t *offset0)
>  {
> +       struct page **oldpages = *pages;
> +       int ret;

Should be ssize_t to avoid truncation?

> +
>         maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT);
>         if (!maxsize)
>                 return 0;
>
>         if (likely(user_backed_iter(i)))
> -               return iov_iter_extract_user_pages(i, pages, maxsize,
> -                                                  maxpages, extraction_flags,
> -                                                  offset0);
> -       if (iov_iter_is_kvec(i))
> -               return iov_iter_extract_kvec_pages(i, pages, maxsize,
> -                                                  maxpages, extraction_flags,
> -                                                  offset0);
> -       if (iov_iter_is_bvec(i))
> -               return iov_iter_extract_bvec_pages(i, pages, maxsize,
> -                                                  maxpages, extraction_flags,
> -                                                  offset0);
> -       if (iov_iter_is_folioq(i))
> -               return iov_iter_extract_folioq_pages(i, pages, maxsize,
> -                                                    maxpages, extraction_flags,
> -                                                    offset0);
> -       if (iov_iter_is_xarray(i))
> -               return iov_iter_extract_xarray_pages(i, pages, maxsize,
> -                                                    maxpages, extraction_flags,
> -                                                    offset0);
> -       return -EFAULT;
> +               ret = iov_iter_extract_user_pages(i, pages, maxsize,
> +                                                 maxpages, extraction_flags,
> +                                                 offset0);
> +       else if (iov_iter_is_kvec(i))
> +               ret = iov_iter_extract_kvec_pages(i, pages, maxsize,
> +                                                 maxpages, extraction_flags,
> +                                                 offset0);
> +       else if (iov_iter_is_bvec(i))
> +               ret = iov_iter_extract_bvec_pages(i, pages, maxsize,
> +                                                 maxpages, extraction_flags,
> +                                                 offset0);
> +       else if (iov_iter_is_folioq(i))
> +               ret = iov_iter_extract_folioq_pages(i, pages, maxsize,
> +                                                   maxpages, extraction_flags,
> +                                                   offset0);
> +       else if (iov_iter_is_xarray(i))
> +               ret = iov_iter_extract_xarray_pages(i, pages, maxsize,
> +                                                   maxpages, extraction_flags,
> +                                                   offset0);
> +       else
> +               ret = -EFAULT;
> +
> +       if (unlikely(ret) && *pages && *pages != oldpages) {

iov_iter_extract_pages() returns a positive value (number of bytes
extracted) on success (even if all maxsize bytes were extracted), so I
don't think this condition is correct. Callers seem to be inconsistent
about whether they treat ret < 0 or ret <= 0 as the error case, which
is another argument not to handle freeing the pages array inside
iov_iter_extract_pages().

Best,
Caleb

> +               kvfree(*pages);
> +               *pages = oldpages;
> +       }
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(iov_iter_extract_pages);
>
> --
> 2.54.0
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] lib: free pagelist on error in iov_iter_extract_pages()
  2026-05-08 11:13 [PATCH] lib: free pagelist on error in iov_iter_extract_pages() Dmitry Antipov
  2026-05-08 18:33 ` Caleb Sander Mateos
@ 2026-05-08 21:48 ` Andrew Morton
  1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2026-05-08 21:48 UTC (permalink / raw)
  To: Dmitry Antipov
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-fsdevel,
	lvc-project, Fedor Pchelkin

On Fri,  8 May 2026 14:13:29 +0300 Dmitry Antipov <dmantipov@yandex.ru> wrote:

> Since 'iov_iter_extract_pages()' may allocate new pagelist if the passed
> one isn't large enough, the worst-case scenario may be:
> 
> ...
> struct page *stack_pages[SMALL];
> struct page **pages = stack_pages;
> ...
> if (iov_iter_extract_pages(i..., &pages, ...) <= 0) {
>         /* Even in case of error, new pagelist may be allocated */
>         if (pages != stack_pages)
>                 kvfree(pages);                                  [1]
>         /* The rest of error handling and return */
> }
> /* Regular flow */
> ...
> if (pages != stack_pages)
>         kvfree(pages);
> ...
> return 0;
> 
> If you're unlucky so SMALL amount of pages wasn't enough and new
> pagelist was allocated, missing [1] causes the memory leak similar
> to one I've recently observed and fixed for 6.12 in [2]. So adjust
> 'iov_iter_extract_pages()' to make such a cleanup itself rather than
> rely on caller's handling on error paths, thus making [1] not needed.

AI review said things:
	https://sashiko.dev/#/patchset/20260508111329.329943-1-dmantipov@yandex.ru


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] lib: free pagelist on error in iov_iter_extract_pages()
  2026-05-08 18:33 ` Caleb Sander Mateos
@ 2026-05-11  6:35   ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2026-05-11  6:35 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Dmitry Antipov, Jens Axboe, Andrew Morton, Christoph Hellwig,
	linux-block, linux-fsdevel, lvc-project, Fedor Pchelkin,
	David Howells, ric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet, v9fs, Ilya Dryomov, Alex Markuze,
	Viacheslav Dubeyko, ceph-devel, Pranjal Shrivastava, linux-nfs,
	Christian Schoenebeck

[adding authors and maintainers of relevant code]

On Fri, May 08, 2026 at 11:33:50AM -0700, Caleb Sander Mateos wrote:
> iov_iter_extract_pages() will only allocate a pages array if the
> initial struct page ** passed is NULL (see want_pages_array()). So the
> condition pages != stack_pages will never be true. Indeed, it looks
> like *all* callers of iov_iter_extract_pages() pass a non-NULL struct
> page **. Would it make sense for iov_iter_extract_pages() to require a
> pre-allocated pages array and remove support for allocating one?

I think the idea was to support this to replace existing users of
iov_iter_get_pages_alloc2.  Which is urgently neede as those missing
the proper page pinning support.  OTOH we should not keep dead code
around just in case.

For NFS, Pranjal has an initial series to convert away from
iov_iter_get_pages_alloc2, which makes use of the NULL pages argument
to iov_iter_extract_pages.

For 9p, Dominique had an untested patch in December that drops
iov_iter_get_pages_alloc2 in favor of a much better high level approach
that doesn't even involve iov_iter_extract_pages, which seems to not have
made it anywhere.  It would be great to get this going again.

net/ceph/ needs to also do work.  Not an expert on the code, but given
how it is based off the encryption flag it looks kinda fishy and
iov_iter_extract_pages might not be the best direct replacement.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-11  6:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08 11:13 [PATCH] lib: free pagelist on error in iov_iter_extract_pages() Dmitry Antipov
2026-05-08 18:33 ` Caleb Sander Mateos
2026-05-11  6:35   ` Christoph Hellwig
2026-05-08 21:48 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox