public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Christoph Hellwig <hch@lst.de>, axboe@kernel.dk
Cc: akpm@linux-foundation.org, viro@zeniv.linux.org.uk,
	dhowells@redhat.com, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, ming.lei@redhat.com
Subject: Re: [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
Date: Fri, 1 Nov 2024 18:05:49 +0100	[thread overview]
Message-ID: <fd0ec853-1f9c-4013-8b5d-89357594d02f@gmail.com> (raw)
In-Reply-To: <20241024050021.627350-1-hch@lst.de>


On 10/24/24 7:00 AM, Christoph Hellwig wrote:
> From: Ming Lei <ming.lei@redhat.com>
>
> The iov_iter_extract_pages interface allows to return physically
> discontiguous pages, as long as all but the first and last page
> in the array are page aligned and page size.  Rewrite
> iov_iter_extract_bvec_pages to take advantage of that instead of only
> returning ranges of physically contiguous pages.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> [hch: minor cleanups, new commit log]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   lib/iov_iter.c | 67 +++++++++++++++++++++++++++++++++-----------------
>   1 file changed, 45 insertions(+), 22 deletions(-)
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 1abb32c0da50..9fc06f5fb748 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1677,8 +1677,8 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i,
>   }
>   
>   /*
> - * Extract a list of contiguous pages from an ITER_BVEC iterator.  This does
> - * not get references on the pages, nor does it get a pin on them.
> + * Extract a list of virtually contiguous pages from an ITER_BVEC iterator.
> + * This does not get references on the pages, nor does it get a pin on them.
>    */
>   static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
>   					   struct page ***pages, size_t maxsize,
> @@ -1686,35 +1686,58 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
>   					   iov_iter_extraction_t extraction_flags,
>   					   size_t *offset0)
>   {
> -	struct page **p, *page;
> -	size_t skip = i->iov_offset, offset, size;
> -	int k;
> +	size_t skip = i->iov_offset, size = 0;
> +	struct bvec_iter bi;
> +	int k = 0;
>   
> -	for (;;) {
> -		if (i->nr_segs == 0)
> -			return 0;
> -		size = min(maxsize, i->bvec->bv_len - skip);
> -		if (size)
> -			break;
> +	if (i->nr_segs == 0)
> +		return 0;
> +
> +	if (i->iov_offset == i->bvec->bv_len) {
>   		i->iov_offset = 0;
>   		i->nr_segs--;
>   		i->bvec++;
>   		skip = 0;
>   	}
> +	bi.bi_size = maxsize + skip;
> +	bi.bi_bvec_done = skip;
> +
> +	maxpages = want_pages_array(pages, maxsize, skip, maxpages);
> +
> +	while (bi.bi_size && bi.bi_idx < i->nr_segs) {
> +		struct bio_vec bv = bvec_iter_bvec(i->bvec, bi);
> +
> +		/*
> +		 * The iov_iter_extract_pages interface only allows an offset
> +		 * into the first page.  Break out of the loop if we see an
> +		 * offset into subsequent pages, the caller will have to call
> +		 * iov_iter_extract_pages again for the reminder.
> +		 */
> +		if (k) {
> +			if (bv.bv_offset)
> +				break;
> +		} else {
> +			*offset0 = bv.bv_offset;
> +		}
>   
> -	skip += i->bvec->bv_offset;
> -	page = i->bvec->bv_page + skip / PAGE_SIZE;
> -	offset = skip % PAGE_SIZE;
> -	*offset0 = offset;
> +		(*pages)[k++] = bv.bv_page;
> +		size += bv.bv_len;
>   
> -	maxpages = want_pages_array(pages, size, offset, maxpages);
> -	if (!maxpages)
> -		return -ENOMEM;
> -	p = *pages;
> -	for (k = 0; k < maxpages; k++)
> -		p[k] = page + k;
> +		if (k >= maxpages)
> +			break;
> +
> +		/*
> +		 * We are done when the end of the bvec doesn't align to a page
> +		 * boundary as that would create a hole in the returned space.
> +		 * The caller will handle this with another call to
> +		 * iov_iter_extract_pages.
> +		 */
> +		if (bv.bv_offset + bv.bv_len != PAGE_SIZE)
> +			break;
> +
> +		bvec_iter_advance_single(i->bvec, &bi, bv.bv_len);
> +	}
>   
> -	size = min_t(size_t, size, maxpages * PAGE_SIZE - offset);
>   	iov_iter_advance(i, size);
>   	return size;
>   }


This is causing major network regression in UDP sendfile, found by syzbot.

I will release the syzbot report and this fix :

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 65ec660c2960..e19aab1fccca 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1728,6 +1728,10 @@ static ssize_t iov_iter_extract_bvec_pages(struct 
iov_iter *i,
                 (*pages)[k++] = bv.bv_page;
                 size += bv.bv_len;

+               if (size > maxsize) {
+                       size = maxsize;
+                       break;
+               }
                 if (k >= maxpages)
                         break;



  parent reply	other threads:[~2024-11-01 17:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-24  5:00 [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages Christoph Hellwig
2024-10-29 15:26 ` Jens Axboe
2024-10-30 17:56 ` Klara Modin
2024-10-31  0:14   ` Ming Lei
2024-10-31  0:22     ` Ming Lei
2024-10-31  8:42       ` Klara Modin
2024-10-31 11:17         ` Ming Lei
2024-11-01 17:05 ` Eric Dumazet [this message]
2024-11-01 18:00   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fd0ec853-1f9c-4013-8b5d-89357594d02f@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=dhowells@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox