* [PATCH] lib: free pagelist on error in iov_iter_extract_pages()
@ 2026-05-08 11:13 Dmitry Antipov
2026-05-08 18:33 ` Caleb Sander Mateos
2026-05-08 21:48 ` Andrew Morton
0 siblings, 2 replies; 7+ messages in thread
From: Dmitry Antipov @ 2026-05-08 11:13 UTC (permalink / raw)
To: Jens Axboe, Andrew Morton
Cc: Christoph Hellwig, linux-block, linux-fsdevel, lvc-project,
Dmitry Antipov, Fedor Pchelkin
Since 'iov_iter_extract_pages()' may allocate new pagelist if the passed
one isn't large enough, the worst-case scenario may be:
...
struct page *stack_pages[SMALL];
struct page **pages = stack_pages;
...
if (iov_iter_extract_pages(i..., &pages, ...) <= 0) {
/* Even in case of error, new pagelist may be allocated */
if (pages != stack_pages)
kvfree(pages); [1]
/* The rest of error handling and return */
}
/* Regular flow */
...
if (pages != stack_pages)
kvfree(pages);
...
return 0;
If you're unlucky so SMALL amount of pages wasn't enough and new
pagelist was allocated, missing [1] causes the memory leak similar
to one I've recently observed and fixed for 6.12 in [2]. So adjust
'iov_iter_extract_pages()' to make such a cleanup itself rather than
rely on caller's handling on error paths, thus making [1] not needed.
[2] https://lore.kernel.org/stable/20260505094529.406783-1-dmantipov@yandex.ru/T/#u
Suggested-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
---
lib/iov_iter.c | 54 ++++++++++++++++++++++++++++++--------------------
1 file changed, 33 insertions(+), 21 deletions(-)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 243662af1af7..46dd11913df0 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1807,7 +1807,8 @@ static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
* (*) Use with ITER_DISCARD is not supported as that has no content.
*
* On success, the function sets *@pages to the new pagelist, if allocated, and
- * sets *offset0 to the offset into the first page.
+ * sets *offset0 to the offset into the first page. On error, new pagelist
+ * is freed if was allocated, and *@pages sets back to its original value.
*
* It may also return -ENOMEM and -EFAULT.
*/
@@ -1818,31 +1819,42 @@ ssize_t iov_iter_extract_pages(struct iov_iter *i,
iov_iter_extraction_t extraction_flags,
size_t *offset0)
{
+ struct page **oldpages = *pages;
+ int ret;
+
maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT);
if (!maxsize)
return 0;
if (likely(user_backed_iter(i)))
- return iov_iter_extract_user_pages(i, pages, maxsize,
- maxpages, extraction_flags,
- offset0);
- if (iov_iter_is_kvec(i))
- return iov_iter_extract_kvec_pages(i, pages, maxsize,
- maxpages, extraction_flags,
- offset0);
- if (iov_iter_is_bvec(i))
- return iov_iter_extract_bvec_pages(i, pages, maxsize,
- maxpages, extraction_flags,
- offset0);
- if (iov_iter_is_folioq(i))
- return iov_iter_extract_folioq_pages(i, pages, maxsize,
- maxpages, extraction_flags,
- offset0);
- if (iov_iter_is_xarray(i))
- return iov_iter_extract_xarray_pages(i, pages, maxsize,
- maxpages, extraction_flags,
- offset0);
- return -EFAULT;
+ ret = iov_iter_extract_user_pages(i, pages, maxsize,
+ maxpages, extraction_flags,
+ offset0);
+ else if (iov_iter_is_kvec(i))
+ ret = iov_iter_extract_kvec_pages(i, pages, maxsize,
+ maxpages, extraction_flags,
+ offset0);
+ else if (iov_iter_is_bvec(i))
+ ret = iov_iter_extract_bvec_pages(i, pages, maxsize,
+ maxpages, extraction_flags,
+ offset0);
+ else if (iov_iter_is_folioq(i))
+ ret = iov_iter_extract_folioq_pages(i, pages, maxsize,
+ maxpages, extraction_flags,
+ offset0);
+ else if (iov_iter_is_xarray(i))
+ ret = iov_iter_extract_xarray_pages(i, pages, maxsize,
+ maxpages, extraction_flags,
+ offset0);
+ else
+ ret = -EFAULT;
+
+ if (unlikely(ret) && *pages && *pages != oldpages) {
+ kvfree(*pages);
+ *pages = oldpages;
+ }
+
+ return ret;
}
EXPORT_SYMBOL_GPL(iov_iter_extract_pages);
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] lib: free pagelist on error in iov_iter_extract_pages()
2026-05-08 11:13 [PATCH] lib: free pagelist on error in iov_iter_extract_pages() Dmitry Antipov
@ 2026-05-08 18:33 ` Caleb Sander Mateos
2026-05-11 6:35 ` Christoph Hellwig
` (2 more replies)
2026-05-08 21:48 ` Andrew Morton
1 sibling, 3 replies; 7+ messages in thread
From: Caleb Sander Mateos @ 2026-05-08 18:33 UTC (permalink / raw)
To: Dmitry Antipov
Cc: Jens Axboe, Andrew Morton, Christoph Hellwig, linux-block,
linux-fsdevel, lvc-project, Fedor Pchelkin
On Fri, May 8, 2026 at 4:16 AM Dmitry Antipov <dmantipov@yandex.ru> wrote:
>
> Since 'iov_iter_extract_pages()' may allocate new pagelist if the passed
> one isn't large enough, the worst-case scenario may be:
>
> ...
> struct page *stack_pages[SMALL];
> struct page **pages = stack_pages;
> ...
> if (iov_iter_extract_pages(i..., &pages, ...) <= 0) {
> /* Even in case of error, new pagelist may be allocated */
> if (pages != stack_pages)
> kvfree(pages); [1]
iov_iter_extract_pages() will only allocate a pages array if the
initial struct page ** passed is NULL (see want_pages_array()). So the
condition pages != stack_pages will never be true. Indeed, it looks
like *all* callers of iov_iter_extract_pages() pass a non-NULL struct
page **. Would it make sense for iov_iter_extract_pages() to require a
pre-allocated pages array and remove support for allocating one?
> /* The rest of error handling and return */
> }
> /* Regular flow */
> ...
> if (pages != stack_pages)
> kvfree(pages);
> ...
> return 0;
>
> If you're unlucky so SMALL amount of pages wasn't enough and new
> pagelist was allocated, missing [1] causes the memory leak similar
> to one I've recently observed and fixed for 6.12 in [2]. So adjust
> 'iov_iter_extract_pages()' to make such a cleanup itself rather than
> rely on caller's handling on error paths, thus making [1] not needed.
>
> [2] https://lore.kernel.org/stable/20260505094529.406783-1-dmantipov@yandex.ru/T/#u
>
> Suggested-by: Fedor Pchelkin <pchelkin@ispras.ru>
> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
> ---
> lib/iov_iter.c | 54 ++++++++++++++++++++++++++++++--------------------
> 1 file changed, 33 insertions(+), 21 deletions(-)
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 243662af1af7..46dd11913df0 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1807,7 +1807,8 @@ static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
> * (*) Use with ITER_DISCARD is not supported as that has no content.
> *
> * On success, the function sets *@pages to the new pagelist, if allocated, and
> - * sets *offset0 to the offset into the first page.
> + * sets *offset0 to the offset into the first page. On error, new pagelist
> + * is freed if was allocated, and *@pages sets back to its original value.
> *
> * It may also return -ENOMEM and -EFAULT.
> */
> @@ -1818,31 +1819,42 @@ ssize_t iov_iter_extract_pages(struct iov_iter *i,
> iov_iter_extraction_t extraction_flags,
> size_t *offset0)
> {
> + struct page **oldpages = *pages;
> + int ret;
Should be ssize_t to avoid truncation?
> +
> maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT);
> if (!maxsize)
> return 0;
>
> if (likely(user_backed_iter(i)))
> - return iov_iter_extract_user_pages(i, pages, maxsize,
> - maxpages, extraction_flags,
> - offset0);
> - if (iov_iter_is_kvec(i))
> - return iov_iter_extract_kvec_pages(i, pages, maxsize,
> - maxpages, extraction_flags,
> - offset0);
> - if (iov_iter_is_bvec(i))
> - return iov_iter_extract_bvec_pages(i, pages, maxsize,
> - maxpages, extraction_flags,
> - offset0);
> - if (iov_iter_is_folioq(i))
> - return iov_iter_extract_folioq_pages(i, pages, maxsize,
> - maxpages, extraction_flags,
> - offset0);
> - if (iov_iter_is_xarray(i))
> - return iov_iter_extract_xarray_pages(i, pages, maxsize,
> - maxpages, extraction_flags,
> - offset0);
> - return -EFAULT;
> + ret = iov_iter_extract_user_pages(i, pages, maxsize,
> + maxpages, extraction_flags,
> + offset0);
> + else if (iov_iter_is_kvec(i))
> + ret = iov_iter_extract_kvec_pages(i, pages, maxsize,
> + maxpages, extraction_flags,
> + offset0);
> + else if (iov_iter_is_bvec(i))
> + ret = iov_iter_extract_bvec_pages(i, pages, maxsize,
> + maxpages, extraction_flags,
> + offset0);
> + else if (iov_iter_is_folioq(i))
> + ret = iov_iter_extract_folioq_pages(i, pages, maxsize,
> + maxpages, extraction_flags,
> + offset0);
> + else if (iov_iter_is_xarray(i))
> + ret = iov_iter_extract_xarray_pages(i, pages, maxsize,
> + maxpages, extraction_flags,
> + offset0);
> + else
> + ret = -EFAULT;
> +
> + if (unlikely(ret) && *pages && *pages != oldpages) {
iov_iter_extract_pages() returns a positive value (number of bytes
extracted) on success (even if all maxsize bytes were extracted), so I
don't think this condition is correct. Callers seem to be inconsistent
about whether they treat ret < 0 or ret <= 0 as the error case, which
is another argument not to handle freeing the pages array inside
iov_iter_extract_pages().
Best,
Caleb
> + kvfree(*pages);
> + *pages = oldpages;
> + }
> +
> + return ret;
> }
> EXPORT_SYMBOL_GPL(iov_iter_extract_pages);
>
> --
> 2.54.0
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] lib: free pagelist on error in iov_iter_extract_pages()
2026-05-08 11:13 [PATCH] lib: free pagelist on error in iov_iter_extract_pages() Dmitry Antipov
2026-05-08 18:33 ` Caleb Sander Mateos
@ 2026-05-08 21:48 ` Andrew Morton
1 sibling, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2026-05-08 21:48 UTC (permalink / raw)
To: Dmitry Antipov
Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-fsdevel,
lvc-project, Fedor Pchelkin
On Fri, 8 May 2026 14:13:29 +0300 Dmitry Antipov <dmantipov@yandex.ru> wrote:
> Since 'iov_iter_extract_pages()' may allocate new pagelist if the passed
> one isn't large enough, the worst-case scenario may be:
>
> ...
> struct page *stack_pages[SMALL];
> struct page **pages = stack_pages;
> ...
> if (iov_iter_extract_pages(i..., &pages, ...) <= 0) {
> /* Even in case of error, new pagelist may be allocated */
> if (pages != stack_pages)
> kvfree(pages); [1]
> /* The rest of error handling and return */
> }
> /* Regular flow */
> ...
> if (pages != stack_pages)
> kvfree(pages);
> ...
> return 0;
>
> If you're unlucky so SMALL amount of pages wasn't enough and new
> pagelist was allocated, missing [1] causes the memory leak similar
> to one I've recently observed and fixed for 6.12 in [2]. So adjust
> 'iov_iter_extract_pages()' to make such a cleanup itself rather than
> rely on caller's handling on error paths, thus making [1] not needed.
AI review said things:
https://sashiko.dev/#/patchset/20260508111329.329943-1-dmantipov@yandex.ru
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] lib: free pagelist on error in iov_iter_extract_pages()
2026-05-08 18:33 ` Caleb Sander Mateos
@ 2026-05-11 6:35 ` Christoph Hellwig
2026-05-12 8:12 ` Dmitry Antipov
2026-05-12 11:23 ` Dmitry Antipov
2 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-05-11 6:35 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Dmitry Antipov, Jens Axboe, Andrew Morton, Christoph Hellwig,
linux-block, linux-fsdevel, lvc-project, Fedor Pchelkin,
David Howells, ric Van Hensbergen, Latchesar Ionkov,
Dominique Martinet, v9fs, Ilya Dryomov, Alex Markuze,
Viacheslav Dubeyko, ceph-devel, Pranjal Shrivastava, linux-nfs,
Christian Schoenebeck
[adding authors and maintainers of relevant code]
On Fri, May 08, 2026 at 11:33:50AM -0700, Caleb Sander Mateos wrote:
> iov_iter_extract_pages() will only allocate a pages array if the
> initial struct page ** passed is NULL (see want_pages_array()). So the
> condition pages != stack_pages will never be true. Indeed, it looks
> like *all* callers of iov_iter_extract_pages() pass a non-NULL struct
> page **. Would it make sense for iov_iter_extract_pages() to require a
> pre-allocated pages array and remove support for allocating one?
I think the idea was to support this to replace existing users of
iov_iter_get_pages_alloc2. Which is urgently neede as those missing
the proper page pinning support. OTOH we should not keep dead code
around just in case.
For NFS, Pranjal has an initial series to convert away from
iov_iter_get_pages_alloc2, which makes use of the NULL pages argument
to iov_iter_extract_pages.
For 9p, Dominique had an untested patch in December that drops
iov_iter_get_pages_alloc2 in favor of a much better high level approach
that doesn't even involve iov_iter_extract_pages, which seems to not have
made it anywhere. It would be great to get this going again.
net/ceph/ needs to also do work. Not an expert on the code, but given
how it is based off the encryption flag it looks kinda fishy and
iov_iter_extract_pages might not be the best direct replacement.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] lib: free pagelist on error in iov_iter_extract_pages()
2026-05-08 18:33 ` Caleb Sander Mateos
2026-05-11 6:35 ` Christoph Hellwig
@ 2026-05-12 8:12 ` Dmitry Antipov
2026-05-12 9:07 ` Fedor Pchelkin
2026-05-12 11:23 ` Dmitry Antipov
2 siblings, 1 reply; 7+ messages in thread
From: Dmitry Antipov @ 2026-05-12 8:12 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Jens Axboe, Andrew Morton, Christoph Hellwig, linux-block,
linux-fsdevel, lvc-project, Fedor Pchelkin
On Fri, 2026-05-08 at 11:33 -0700, Caleb Sander Mateos wrote:
> Indeed, it looks like *all* callers of iov_iter_extract_pages()
> pass a non-NULL struct page **.
This is not true for 6.12.x at least (where this issue was initially discovered)
where bio_map_user_iov() do the following:
...
while (iov_iter_count(iter)) {
struct page *stack_pages[UIO_FASTIOV];
struct page **pages = stack_pages;
ssize_t bytes;
size_t offs;
int npages;
if (nr_vecs > ARRAY_SIZE(stack_pages))
/* Stack pages aren't enough, so set 'pages' to NULL
and force allocation by want_pages_array(). */
pages = NULL;
bytes = iov_iter_extract_pages(iter, &pages, LONG_MAX,
nr_vecs, extraction_flags, &offs);
if (unlikely(bytes <= 0)) {
/* If 'pages' was allocated (and so pages != stack_pages), memory leak here */
ret = bytes ? bytes : -EFAULT;
goto out_unmap;
}
For 6.12.x and may be other stables, the simplest possible fix is
https://lore.kernel.org/stable/20260505094529.406783-1-dmantipov@yandex.ru/T/#u.
For upstream, it may be better to tweak iov_iter_extract_pages() itself, but
it may be required to adjust all of the callers.
Dmitry
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] lib: free pagelist on error in iov_iter_extract_pages()
2026-05-12 8:12 ` Dmitry Antipov
@ 2026-05-12 9:07 ` Fedor Pchelkin
0 siblings, 0 replies; 7+ messages in thread
From: Fedor Pchelkin @ 2026-05-12 9:07 UTC (permalink / raw)
To: Dmitry Antipov, Caleb Sander Mateos
Cc: Jens Axboe, Andrew Morton, Christoph Hellwig, linux-block,
linux-fsdevel, lvc-project
On Tue, 12. May 11:12, Dmitry Antipov wrote:
> On Fri, 2026-05-08 at 11:33 -0700, Caleb Sander Mateos wrote:
>
> > Indeed, it looks like *all* callers of iov_iter_extract_pages()
> > pass a non-NULL struct page **.
>
> This is not true for 6.12.x at least (where this issue was initially discovered)
> where bio_map_user_iov() do the following:
The current upstream has a caller of iov_iter_extract_pages() in
block/bio-integrity.c where NULL *@pages argument can be passed.
>
> For 6.12.x and may be other stables, the simplest possible fix is
> https://lore.kernel.org/stable/20260505094529.406783-1-dmantipov@yandex.ru/T/#u.
Older kernels have this pattern inside block/blk-map.c as well which you
fix with the linked patch. But there's still the same error left in
block/bio-integrity.c in 6.12.x, too. All the places ideally should be
fixed and this can be done as a couple of patches for 6.12.x. I'm not
against a step-by-step approach, but think this problem relates to the API
contract and all the cases should be considered as a whole if being
touched anyway.
>
> For upstream, it may be better to tweak iov_iter_extract_pages() itself, but
> it may be required to adjust all of the callers.
Taken the point that callers seem to be inconsistent about whether they
treat ret < 0 or ret <= 0 as the error case, it's up to them to decide
depending on their own logic. Not sure if we can suddenly convert all
'< 0' to '<= 0', this doesn't look correct.
And I see now that my initial suggestion to fix the problem in the callees
isn't so sweet and simple as it seemed, and agree with Caleb's and your
initial proposal to adjust the callers' side then.
In that case it'd be good to add some comment to iov_iter_extract_pages()
explaining that the caller is responsible for freeing the allocated memory
if it passes NULL *@pages argument - _even_ if iov_iter_extract_pages()
fails.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] lib: free pagelist on error in iov_iter_extract_pages()
2026-05-08 18:33 ` Caleb Sander Mateos
2026-05-11 6:35 ` Christoph Hellwig
2026-05-12 8:12 ` Dmitry Antipov
@ 2026-05-12 11:23 ` Dmitry Antipov
2 siblings, 0 replies; 7+ messages in thread
From: Dmitry Antipov @ 2026-05-12 11:23 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Jens Axboe, Andrew Morton, Christoph Hellwig, linux-block,
linux-fsdevel, lvc-project, Fedor Pchelkin
On Fri, 2026-05-08 at 11:33 -0700, Caleb Sander Mateos wrote:
> Callers seem to be inconsistent about whether they treat ret < 0
> or ret <= 0 as the error case, which is another argument not to
> handle freeing the pages array inside iov_iter_extract_pages().
Hmm... looking through iov_iter_extract_pages() innards, ret == 0 means something like
"no error but no data for further processing". Using iov_iter_extract_user_pages()
as an example, iov_iter_advance() is not called if pin_user_pages_fast() returns 0
and, since the pages array won't be touched anymore, it may be freed safely.
Dmitry
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-05-12 11:29 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08 11:13 [PATCH] lib: free pagelist on error in iov_iter_extract_pages() Dmitry Antipov
2026-05-08 18:33 ` Caleb Sander Mateos
2026-05-11 6:35 ` Christoph Hellwig
2026-05-12 8:12 ` Dmitry Antipov
2026-05-12 9:07 ` Fedor Pchelkin
2026-05-12 11:23 ` Dmitry Antipov
2026-05-08 21:48 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox