* [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
@ 2024-10-24 5:00 Christoph Hellwig
2024-10-29 15:26 ` Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-10-24 5:00 UTC (permalink / raw)
To: axboe; +Cc: akpm, viro, dhowells, linux-block, linux-kernel, ming.lei
From: Ming Lei <ming.lei@redhat.com>
The iov_iter_extract_pages interface allows to return physically
discontiguous pages, as long as all but the first and last page
in the array are page aligned and page size. Rewrite
iov_iter_extract_bvec_pages to take advantage of that instead of only
returning ranges of physically contiguous pages.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[hch: minor cleanups, new commit log]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
lib/iov_iter.c | 67 +++++++++++++++++++++++++++++++++-----------------
1 file changed, 45 insertions(+), 22 deletions(-)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1abb32c0da50..9fc06f5fb748 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1677,8 +1677,8 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i,
}
/*
- * Extract a list of contiguous pages from an ITER_BVEC iterator. This does
- * not get references on the pages, nor does it get a pin on them.
+ * Extract a list of virtually contiguous pages from an ITER_BVEC iterator.
+ * This does not get references on the pages, nor does it get a pin on them.
*/
static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
struct page ***pages, size_t maxsize,
@@ -1686,35 +1686,58 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
iov_iter_extraction_t extraction_flags,
size_t *offset0)
{
- struct page **p, *page;
- size_t skip = i->iov_offset, offset, size;
- int k;
+ size_t skip = i->iov_offset, size = 0;
+ struct bvec_iter bi;
+ int k = 0;
- for (;;) {
- if (i->nr_segs == 0)
- return 0;
- size = min(maxsize, i->bvec->bv_len - skip);
- if (size)
- break;
+ if (i->nr_segs == 0)
+ return 0;
+
+ if (i->iov_offset == i->bvec->bv_len) {
i->iov_offset = 0;
i->nr_segs--;
i->bvec++;
skip = 0;
}
+ bi.bi_size = maxsize + skip;
+ bi.bi_bvec_done = skip;
+
+ maxpages = want_pages_array(pages, maxsize, skip, maxpages);
+
+ while (bi.bi_size && bi.bi_idx < i->nr_segs) {
+ struct bio_vec bv = bvec_iter_bvec(i->bvec, bi);
+
+ /*
+ * The iov_iter_extract_pages interface only allows an offset
+ * into the first page. Break out of the loop if we see an
+ * offset into subsequent pages, the caller will have to call
+ * iov_iter_extract_pages again for the reminder.
+ */
+ if (k) {
+ if (bv.bv_offset)
+ break;
+ } else {
+ *offset0 = bv.bv_offset;
+ }
- skip += i->bvec->bv_offset;
- page = i->bvec->bv_page + skip / PAGE_SIZE;
- offset = skip % PAGE_SIZE;
- *offset0 = offset;
+ (*pages)[k++] = bv.bv_page;
+ size += bv.bv_len;
- maxpages = want_pages_array(pages, size, offset, maxpages);
- if (!maxpages)
- return -ENOMEM;
- p = *pages;
- for (k = 0; k < maxpages; k++)
- p[k] = page + k;
+ if (k >= maxpages)
+ break;
+
+ /*
+ * We are done when the end of the bvec doesn't align to a page
+ * boundary as that would create a hole in the returned space.
+ * The caller will handle this with another call to
+ * iov_iter_extract_pages.
+ */
+ if (bv.bv_offset + bv.bv_len != PAGE_SIZE)
+ break;
+
+ bvec_iter_advance_single(i->bvec, &bi, bv.bv_len);
+ }
- size = min_t(size_t, size, maxpages * PAGE_SIZE - offset);
iov_iter_advance(i, size);
return size;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
2024-10-24 5:00 [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages Christoph Hellwig
@ 2024-10-29 15:26 ` Jens Axboe
2024-10-30 17:56 ` Klara Modin
2024-11-01 17:05 ` Eric Dumazet
2 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2024-10-29 15:26 UTC (permalink / raw)
To: Christoph Hellwig
Cc: akpm, viro, dhowells, linux-block, linux-kernel, ming.lei
On Thu, 24 Oct 2024 07:00:15 +0200, Christoph Hellwig wrote:
> The iov_iter_extract_pages interface allows to return physically
> discontiguous pages, as long as all but the first and last page
> in the array are page aligned and page size. Rewrite
> iov_iter_extract_bvec_pages to take advantage of that instead of only
> returning ranges of physically contiguous pages.
>
>
> [...]
Applied, thanks!
[1/1] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
(no commit info)
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
2024-10-24 5:00 [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages Christoph Hellwig
2024-10-29 15:26 ` Jens Axboe
@ 2024-10-30 17:56 ` Klara Modin
2024-10-31 0:14 ` Ming Lei
2024-11-01 17:05 ` Eric Dumazet
2 siblings, 1 reply; 9+ messages in thread
From: Klara Modin @ 2024-10-30 17:56 UTC (permalink / raw)
To: Christoph Hellwig, axboe
Cc: akpm, viro, dhowells, linux-block, linux-kernel, ming.lei,
linux-nvme, klara
[-- Attachment #1: Type: text/plain, Size: 4240 bytes --]
Hi,
On 2024-10-24 07:00, Christoph Hellwig wrote:
> From: Ming Lei <ming.lei@redhat.com>
>
> The iov_iter_extract_pages interface allows to return physically
> discontiguous pages, as long as all but the first and last page
> in the array are page aligned and page size. Rewrite
> iov_iter_extract_bvec_pages to take advantage of that instead of only
> returning ranges of physically contiguous pages.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> [hch: minor cleanups, new commit log]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
With this patch (e4e535bff2bc82bb49a633775f9834beeaa527db in
next-20241030), I'm unable to connect via nvme-tcp with this in the log:
nvme nvme1: failed to send request -5
nvme nvme1: Connect command failed: host path error
nvme nvme1: failed to connect queue: 0 ret=880
With the patch reverted it works as expected:
nvme nvme1: creating 24 I/O queues.
nvme nvme1: mapped 24/0/0 default/read/poll queues.
nvme nvme1: new ctrl: NQN
"nqn.2018-06.eu.kasm.int:freenas:backup:parmesan.int.kasm.eu", addr
[2001:0678:0a5c:1204:6245:cbff:fe9c:4f59]:4420, hostnqn:
nqn.2018-06.eu.kasm.int:parmesan
Please let me know if there's anything else you need.
Regards,
Klara Modin
+CC: linux-nvme
> ---
> lib/iov_iter.c | 67 +++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 45 insertions(+), 22 deletions(-)
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 1abb32c0da50..9fc06f5fb748 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1677,8 +1677,8 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i,
> }
>
> /*
> - * Extract a list of contiguous pages from an ITER_BVEC iterator. This does
> - * not get references on the pages, nor does it get a pin on them.
> + * Extract a list of virtually contiguous pages from an ITER_BVEC iterator.
> + * This does not get references on the pages, nor does it get a pin on them.
> */
> static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
> struct page ***pages, size_t maxsize,
> @@ -1686,35 +1686,58 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
> iov_iter_extraction_t extraction_flags,
> size_t *offset0)
> {
> - struct page **p, *page;
> - size_t skip = i->iov_offset, offset, size;
> - int k;
> + size_t skip = i->iov_offset, size = 0;
> + struct bvec_iter bi;
> + int k = 0;
>
> - for (;;) {
> - if (i->nr_segs == 0)
> - return 0;
> - size = min(maxsize, i->bvec->bv_len - skip);
> - if (size)
> - break;
> + if (i->nr_segs == 0)
> + return 0;
> +
> + if (i->iov_offset == i->bvec->bv_len) {
> i->iov_offset = 0;
> i->nr_segs--;
> i->bvec++;
> skip = 0;
> }
> + bi.bi_size = maxsize + skip;
> + bi.bi_bvec_done = skip;
> +
> + maxpages = want_pages_array(pages, maxsize, skip, maxpages);
> +
> + while (bi.bi_size && bi.bi_idx < i->nr_segs) {
> + struct bio_vec bv = bvec_iter_bvec(i->bvec, bi);
> +
> + /*
> + * The iov_iter_extract_pages interface only allows an offset
> + * into the first page. Break out of the loop if we see an
> + * offset into subsequent pages, the caller will have to call
> + * iov_iter_extract_pages again for the reminder.
> + */
> + if (k) {
> + if (bv.bv_offset)
> + break;
> + } else {
> + *offset0 = bv.bv_offset;
> + }
>
> - skip += i->bvec->bv_offset;
> - page = i->bvec->bv_page + skip / PAGE_SIZE;
> - offset = skip % PAGE_SIZE;
> - *offset0 = offset;
> + (*pages)[k++] = bv.bv_page;
> + size += bv.bv_len;
>
> - maxpages = want_pages_array(pages, size, offset, maxpages);
> - if (!maxpages)
> - return -ENOMEM;
> - p = *pages;
> - for (k = 0; k < maxpages; k++)
> - p[k] = page + k;
> + if (k >= maxpages)
> + break;
> +
> + /*
> + * We are done when the end of the bvec doesn't align to a page
> + * boundary as that would create a hole in the returned space.
> + * The caller will handle this with another call to
> + * iov_iter_extract_pages.
> + */
> + if (bv.bv_offset + bv.bv_len != PAGE_SIZE)
> + break;
> +
> + bvec_iter_advance_single(i->bvec, &bi, bv.bv_len);
> + }
>
> - size = min_t(size_t, size, maxpages * PAGE_SIZE - offset);
> iov_iter_advance(i, size);
> return size;
> }
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 46058 bytes --]
[-- Attachment #3: nvme-tcp_host-path-error_bisect --]
[-- Type: text/plain, Size: 2769 bytes --]
# bad: [cadd411a755d40bf717c2514afb90c7c0762aefc] crypto: rsassa-pkcs1 - Migrate to sig_alg backend
# good: [e42b1a9a2557aa94fee47f078633677198386a52] Merge tag 'spi-fix-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
git bisect start 'next' 'next/stable'
# good: [5837b9daa339313b9009011e0173dd874de3f132] Merge branch 'spi-nor/next' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git
git bisect good 5837b9daa339313b9009011e0173dd874de3f132
# bad: [64f1d5c3ad7542ea8f979988d2af75fd4e18148e] Merge branch 'for-backlight-next' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight.git
git bisect bad 64f1d5c3ad7542ea8f979988d2af75fd4e18148e
# good: [e7103f8785504dd5c6aad118fbc64fc49eda33af] Merge tag 'amd-drm-next-6.13-2024-10-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
git bisect good e7103f8785504dd5c6aad118fbc64fc49eda33af
# good: [7487abf914ecae6ad2690493c2a3fb998738bd71] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git
git bisect good 7487abf914ecae6ad2690493c2a3fb998738bd71
# good: [3f743e703c251c9c3f22088bcdc0330e165c8c94] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git
git bisect good 3f743e703c251c9c3f22088bcdc0330e165c8c94
# bad: [9401ff8e2d60f43ecf343c20a7595b2711bce217] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
git bisect bad 9401ff8e2d60f43ecf343c20a7595b2711bce217
# good: [aff750e7094e26eae686965930ef2bec7f4152da] io_uring/rsrc: clear ->buf before mapping pages
git bisect good aff750e7094e26eae686965930ef2bec7f4152da
# good: [904ebd2527c507752f5ddb358f887d2e0dab96a0] block: remove redundant explicit memory barrier from rq_qos waiter and waker
git bisect good 904ebd2527c507752f5ddb358f887d2e0dab96a0
# bad: [d49acf07fd5629a7e96d3f6cb4a28f5cc04a10bf] Merge branch 'for-next' of git://git.kernel.dk/linux-block.git
git bisect bad d49acf07fd5629a7e96d3f6cb4a28f5cc04a10bf
# good: [f1be1788a32e8fa63416ad4518bbd1a85a825c9d] block: model freeze & enter queue as lock for supporting lockdep
git bisect good f1be1788a32e8fa63416ad4518bbd1a85a825c9d
# bad: [793c08dfe78b646031fe2aa5910e6fef6e872e4a] Merge branch 'for-6.13/block' into for-next
git bisect bad 793c08dfe78b646031fe2aa5910e6fef6e872e4a
# bad: [2f5a65ef30a636d5030917eebd283ac447a212af] block: add a bdev_limits helper
git bisect bad 2f5a65ef30a636d5030917eebd283ac447a212af
# bad: [e4e535bff2bc82bb49a633775f9834beeaa527db] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
git bisect bad e4e535bff2bc82bb49a633775f9834beeaa527db
# first bad commit: [e4e535bff2bc82bb49a633775f9834beeaa527db] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
2024-10-30 17:56 ` Klara Modin
@ 2024-10-31 0:14 ` Ming Lei
2024-10-31 0:22 ` Ming Lei
0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2024-10-31 0:14 UTC (permalink / raw)
To: Klara Modin
Cc: Christoph Hellwig, axboe, akpm, viro, dhowells, linux-block,
linux-kernel, linux-nvme, klara
On Wed, Oct 30, 2024 at 06:56:48PM +0100, Klara Modin wrote:
> Hi,
>
> On 2024-10-24 07:00, Christoph Hellwig wrote:
> > From: Ming Lei <ming.lei@redhat.com>
> >
> > The iov_iter_extract_pages interface allows to return physically
> > discontiguous pages, as long as all but the first and last page
> > in the array are page aligned and page size. Rewrite
> > iov_iter_extract_bvec_pages to take advantage of that instead of only
> > returning ranges of physically contiguous pages.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > [hch: minor cleanups, new commit log]
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> With this patch (e4e535bff2bc82bb49a633775f9834beeaa527db in next-20241030),
> I'm unable to connect via nvme-tcp with this in the log:
>
> nvme nvme1: failed to send request -5
> nvme nvme1: Connect command failed: host path error
> nvme nvme1: failed to connect queue: 0 ret=880
>
> With the patch reverted it works as expected:
>
> nvme nvme1: creating 24 I/O queues.
> nvme nvme1: mapped 24/0/0 default/read/poll queues.
> nvme nvme1: new ctrl: NQN
> "nqn.2018-06.eu.kasm.int:freenas:backup:parmesan.int.kasm.eu", addr
> [2001:0678:0a5c:1204:6245:cbff:fe9c:4f59]:4420, hostnqn:
> nqn.2018-06.eu.kasm.int:parmesan
I can't reproduce it by running blktest 'nvme_trtype=tcp ./check nvme/'
on both next tree & for-6.13/block.
Can you collect the following bpftrace log by running the script before
connecting to nvme-tcp?
Please enable the following kernel options for bpftrace:
CONFIG_KPROBE_EVENTS_ON_NOTRACE=y
CONFIG_NVME_CORE=y
CONFIG_NVME_FABRICS=y
CONFIG_NVME_TCP=y
Btw, bpftrace doesn't work on next tree if nvme is built as module.
# cat extract.bt
#!/usr/bin/bpftrace
kprobe:nvmf_connect_io_queue
{
@connect[tid]=1;
}
kretprobe:nvmf_connect_io_queue
{
@connect[tid]=0;
}
kprobe:iov_iter_extract_pages
/@connect[tid]/
{
$i = (struct iov_iter *)arg0;
printf("extract pages: iter(cnt %lu off %lu) maxsize %u maxpages %u offset %lu\n",
$i->count, $i->iov_offset, arg2, arg3, *((uint32 *)arg4));
printf("\t bvec(off %u len %u)\n", $i->bvec->bv_offset, $i->bvec->bv_len);
}
kretprobe:iov_iter_extract_pages
/@connect[tid]/
{
printf("extract pages: ret %d\n", retval);
}
END {
clear(@connect);
}
Thanks,
Ming
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
2024-10-31 0:14 ` Ming Lei
@ 2024-10-31 0:22 ` Ming Lei
2024-10-31 8:42 ` Klara Modin
0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2024-10-31 0:22 UTC (permalink / raw)
To: Klara Modin
Cc: Christoph Hellwig, axboe, akpm, viro, dhowells, linux-block,
linux-kernel, linux-nvme, klara
On Thu, Oct 31, 2024 at 08:14:49AM +0800, Ming Lei wrote:
> On Wed, Oct 30, 2024 at 06:56:48PM +0100, Klara Modin wrote:
> > Hi,
> >
> > On 2024-10-24 07:00, Christoph Hellwig wrote:
> > > From: Ming Lei <ming.lei@redhat.com>
> > >
> > > The iov_iter_extract_pages interface allows to return physically
> > > discontiguous pages, as long as all but the first and last page
> > > in the array are page aligned and page size. Rewrite
> > > iov_iter_extract_bvec_pages to take advantage of that instead of only
> > > returning ranges of physically contiguous pages.
> > >
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > [hch: minor cleanups, new commit log]
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> >
> > With this patch (e4e535bff2bc82bb49a633775f9834beeaa527db in next-20241030),
> > I'm unable to connect via nvme-tcp with this in the log:
> >
> > nvme nvme1: failed to send request -5
> > nvme nvme1: Connect command failed: host path error
> > nvme nvme1: failed to connect queue: 0 ret=880
> >
> > With the patch reverted it works as expected:
> >
> > nvme nvme1: creating 24 I/O queues.
> > nvme nvme1: mapped 24/0/0 default/read/poll queues.
> > nvme nvme1: new ctrl: NQN
> > "nqn.2018-06.eu.kasm.int:freenas:backup:parmesan.int.kasm.eu", addr
> > [2001:0678:0a5c:1204:6245:cbff:fe9c:4f59]:4420, hostnqn:
> > nqn.2018-06.eu.kasm.int:parmesan
>
> I can't reproduce it by running blktest 'nvme_trtype=tcp ./check nvme/'
> on both next tree & for-6.13/block.
>
> Can you collect the following bpftrace log by running the script before
> connecting to nvme-tcp?
And please try the following patch:
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 9fc06f5fb748..c761f6db3cb4 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1699,6 +1699,7 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
i->bvec++;
skip = 0;
}
+ bi.bi_idx = 0;
bi.bi_size = maxsize + skip;
bi.bi_bvec_done = skip;
Thanks,
Ming
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
2024-10-31 0:22 ` Ming Lei
@ 2024-10-31 8:42 ` Klara Modin
2024-10-31 11:17 ` Ming Lei
0 siblings, 1 reply; 9+ messages in thread
From: Klara Modin @ 2024-10-31 8:42 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, axboe, akpm, viro, dhowells, linux-block,
linux-kernel, linux-nvme, klara
On 2024-10-31 01:22, Ming Lei wrote:
> On Thu, Oct 31, 2024 at 08:14:49AM +0800, Ming Lei wrote:
>> On Wed, Oct 30, 2024 at 06:56:48PM +0100, Klara Modin wrote:
>>> Hi,
>>>
>>> On 2024-10-24 07:00, Christoph Hellwig wrote:
>>>> From: Ming Lei <ming.lei@redhat.com>
>>>>
>>>> The iov_iter_extract_pages interface allows to return physically
>>>> discontiguous pages, as long as all but the first and last page
>>>> in the array are page aligned and page size. Rewrite
>>>> iov_iter_extract_bvec_pages to take advantage of that instead of only
>>>> returning ranges of physically contiguous pages.
>>>>
>>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>>> [hch: minor cleanups, new commit log]
>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>
>>> With this patch (e4e535bff2bc82bb49a633775f9834beeaa527db in next-20241030),
>>> I'm unable to connect via nvme-tcp with this in the log:
>>>
>>> nvme nvme1: failed to send request -5
>>> nvme nvme1: Connect command failed: host path error
>>> nvme nvme1: failed to connect queue: 0 ret=880
>>>
>>> With the patch reverted it works as expected:
>>>
>>> nvme nvme1: creating 24 I/O queues.
>>> nvme nvme1: mapped 24/0/0 default/read/poll queues.
>>> nvme nvme1: new ctrl: NQN
>>> "nqn.2018-06.eu.kasm.int:freenas:backup:parmesan.int.kasm.eu", addr
>>> [2001:0678:0a5c:1204:6245:cbff:fe9c:4f59]:4420, hostnqn:
>>> nqn.2018-06.eu.kasm.int:parmesan
>>
>> I can't reproduce it by running blktest 'nvme_trtype=tcp ./check nvme/'
>> on both next tree & for-6.13/block.
>>
>> Can you collect the following bpftrace log by running the script before
>> connecting to nvme-tcp?
I didn't seem to get any output from the bpftrace script (I confirmed
that I had the config as you requested, but I'm not very familiar with
bpftrace so I could have done something wrong). I could, however,
reproduce the issue in qemu and added breakpoints on
nvmf_connect_io_queue and iov_iter_extract_pages. The breakpoint on
iov_iter_extract_pages got hit once when running nvme connect:
(gdb) break nvmf_connect_io_queue
Breakpoint 1 at 0xffffffff81a5d960: file
/home/klara/git/linux/drivers/nvme/host/fabrics.c, line 525.
(gdb) break iov_iter_extract_pages
Breakpoint 2 at 0xffffffff817633b0: file
/home/klara/git/linux/lib/iov_iter.c, line 1900.
(gdb) c
Continuing.
[Switching to Thread 1.1]
Thread 1 hit Breakpoint 2, iov_iter_extract_pages
(i=i@entry=0xffffc900001ebd68,
pages=pages@entry=0xffffc900001ebb08, maxsize=maxsize@entry=72,
maxpages=8,
extraction_flags=extraction_flags@entry=0,
offset0=offset0@entry=0xffffc900001ebb10)
at /home/klara/git/linux/lib/iov_iter.c:1900
1900 {
(gdb) print i->count
$5 = 72
(gdb) print i->iov_offset
$6 = 0
(gdb) print i->bvec->bv_offset
$7 = 3952
(gdb) print i->bvec->bv_len
$8 = 72
(gdb) c
Continuing.
I didn't hit the breakpoint in nvmf_connect_io_queue, but I instead hit
it if I add it to nvmf_connect_admin_queue. I added this function to the
bpftrace script but that didn't produce any output either.
>
> And please try the following patch:
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 9fc06f5fb748..c761f6db3cb4 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1699,6 +1699,7 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
> i->bvec++;
> skip = 0;
> }
> + bi.bi_idx = 0;
> bi.bi_size = maxsize + skip;
> bi.bi_bvec_done = skip;
>
>
Applying this seems to fix the problem.
Thanks,
Klara Modin
>
> Thanks,
> Ming
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
2024-10-31 8:42 ` Klara Modin
@ 2024-10-31 11:17 ` Ming Lei
0 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2024-10-31 11:17 UTC (permalink / raw)
To: Klara Modin
Cc: Christoph Hellwig, axboe, akpm, viro, dhowells, linux-block,
linux-kernel, linux-nvme, klara
On Thu, Oct 31, 2024 at 09:42:32AM +0100, Klara Modin wrote:
> On 2024-10-31 01:22, Ming Lei wrote:
> > On Thu, Oct 31, 2024 at 08:14:49AM +0800, Ming Lei wrote:
> > > On Wed, Oct 30, 2024 at 06:56:48PM +0100, Klara Modin wrote:
> > > > Hi,
> > > >
> > > > On 2024-10-24 07:00, Christoph Hellwig wrote:
> > > > > From: Ming Lei <ming.lei@redhat.com>
> > > > >
> > > > > The iov_iter_extract_pages interface allows to return physically
> > > > > discontiguous pages, as long as all but the first and last page
> > > > > in the array are page aligned and page size. Rewrite
> > > > > iov_iter_extract_bvec_pages to take advantage of that instead of only
> > > > > returning ranges of physically contiguous pages.
> > > > >
> > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > > [hch: minor cleanups, new commit log]
> > > > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > >
> > > > With this patch (e4e535bff2bc82bb49a633775f9834beeaa527db in next-20241030),
> > > > I'm unable to connect via nvme-tcp with this in the log:
> > > >
> > > > nvme nvme1: failed to send request -5
> > > > nvme nvme1: Connect command failed: host path error
> > > > nvme nvme1: failed to connect queue: 0 ret=880
> > > >
> > > > With the patch reverted it works as expected:
> > > >
> > > > nvme nvme1: creating 24 I/O queues.
> > > > nvme nvme1: mapped 24/0/0 default/read/poll queues.
> > > > nvme nvme1: new ctrl: NQN
> > > > "nqn.2018-06.eu.kasm.int:freenas:backup:parmesan.int.kasm.eu", addr
> > > > [2001:0678:0a5c:1204:6245:cbff:fe9c:4f59]:4420, hostnqn:
> > > > nqn.2018-06.eu.kasm.int:parmesan
> > >
> > > I can't reproduce it by running blktest 'nvme_trtype=tcp ./check nvme/'
> > > on both next tree & for-6.13/block.
> > >
> > > Can you collect the following bpftrace log by running the script before
> > > connecting to nvme-tcp?
>
> I didn't seem to get any output from the bpftrace script (I confirmed that I
> had the config as you requested, but I'm not very familiar with bpftrace so
> I could have done something wrong). I could, however, reproduce the issue in
It works for me on Fedora(37, 40).
> qemu and added breakpoints on nvmf_connect_io_queue and
> iov_iter_extract_pages. The breakpoint on iov_iter_extract_pages got hit
> once when running nvme connect:
>
> (gdb) break nvmf_connect_io_queue
> Breakpoint 1 at 0xffffffff81a5d960: file
> /home/klara/git/linux/drivers/nvme/host/fabrics.c, line 525.
> (gdb) break iov_iter_extract_pages
> Breakpoint 2 at 0xffffffff817633b0: file
> /home/klara/git/linux/lib/iov_iter.c, line 1900.
> (gdb) c
> Continuing.
> [Switching to Thread 1.1]
Wow, debug kernel with gdb, cool!
>
> Thread 1 hit Breakpoint 2, iov_iter_extract_pages
> (i=i@entry=0xffffc900001ebd68,
> pages=pages@entry=0xffffc900001ebb08, maxsize=maxsize@entry=72,
> maxpages=8,
> extraction_flags=extraction_flags@entry=0,
> offset0=offset0@entry=0xffffc900001ebb10)
> at /home/klara/git/linux/lib/iov_iter.c:1900
> 1900 {
> (gdb) print i->count
> $5 = 72
> (gdb) print i->iov_offset
> $6 = 0
> (gdb) print i->bvec->bv_offset
> $7 = 3952
> (gdb) print i->bvec->bv_len
> $8 = 72
> (gdb) c
> Continuing.
>
> I didn't hit the breakpoint in nvmf_connect_io_queue, but I instead hit it
> if I add it to nvmf_connect_admin_queue. I added this function to the
> bpftrace script but that didn't produce any output either.
Your kernel config shows all BTF related options are enabled, maybe
bpftrace userspace issue?
>
> >
> > And please try the following patch:
> >
> > diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> > index 9fc06f5fb748..c761f6db3cb4 100644
> > --- a/lib/iov_iter.c
> > +++ b/lib/iov_iter.c
> > @@ -1699,6 +1699,7 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
> > i->bvec++;
> > skip = 0;
> > }
> > + bi.bi_idx = 0;
> > bi.bi_size = maxsize + skip;
> > bi.bi_bvec_done = skip;
> >
> >
>
> Applying this seems to fix the problem.
Thanks for the test, and the patch is sent out.
thanks,
Ming
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
2024-10-24 5:00 [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages Christoph Hellwig
2024-10-29 15:26 ` Jens Axboe
2024-10-30 17:56 ` Klara Modin
@ 2024-11-01 17:05 ` Eric Dumazet
2024-11-01 18:00 ` Jens Axboe
2 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2024-11-01 17:05 UTC (permalink / raw)
To: Christoph Hellwig, axboe
Cc: akpm, viro, dhowells, linux-block, linux-kernel, ming.lei
On 10/24/24 7:00 AM, Christoph Hellwig wrote:
> From: Ming Lei <ming.lei@redhat.com>
>
> The iov_iter_extract_pages interface allows to return physically
> discontiguous pages, as long as all but the first and last page
> in the array are page aligned and page size. Rewrite
> iov_iter_extract_bvec_pages to take advantage of that instead of only
> returning ranges of physically contiguous pages.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> [hch: minor cleanups, new commit log]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> lib/iov_iter.c | 67 +++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 45 insertions(+), 22 deletions(-)
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 1abb32c0da50..9fc06f5fb748 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1677,8 +1677,8 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i,
> }
>
> /*
> - * Extract a list of contiguous pages from an ITER_BVEC iterator. This does
> - * not get references on the pages, nor does it get a pin on them.
> + * Extract a list of virtually contiguous pages from an ITER_BVEC iterator.
> + * This does not get references on the pages, nor does it get a pin on them.
> */
> static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
> struct page ***pages, size_t maxsize,
> @@ -1686,35 +1686,58 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
> iov_iter_extraction_t extraction_flags,
> size_t *offset0)
> {
> - struct page **p, *page;
> - size_t skip = i->iov_offset, offset, size;
> - int k;
> + size_t skip = i->iov_offset, size = 0;
> + struct bvec_iter bi;
> + int k = 0;
>
> - for (;;) {
> - if (i->nr_segs == 0)
> - return 0;
> - size = min(maxsize, i->bvec->bv_len - skip);
> - if (size)
> - break;
> + if (i->nr_segs == 0)
> + return 0;
> +
> + if (i->iov_offset == i->bvec->bv_len) {
> i->iov_offset = 0;
> i->nr_segs--;
> i->bvec++;
> skip = 0;
> }
> + bi.bi_size = maxsize + skip;
> + bi.bi_bvec_done = skip;
> +
> + maxpages = want_pages_array(pages, maxsize, skip, maxpages);
> +
> + while (bi.bi_size && bi.bi_idx < i->nr_segs) {
> + struct bio_vec bv = bvec_iter_bvec(i->bvec, bi);
> +
> + /*
> + * The iov_iter_extract_pages interface only allows an offset
> + * into the first page. Break out of the loop if we see an
> + * offset into subsequent pages, the caller will have to call
> + * iov_iter_extract_pages again for the reminder.
> + */
> + if (k) {
> + if (bv.bv_offset)
> + break;
> + } else {
> + *offset0 = bv.bv_offset;
> + }
>
> - skip += i->bvec->bv_offset;
> - page = i->bvec->bv_page + skip / PAGE_SIZE;
> - offset = skip % PAGE_SIZE;
> - *offset0 = offset;
> + (*pages)[k++] = bv.bv_page;
> + size += bv.bv_len;
>
> - maxpages = want_pages_array(pages, size, offset, maxpages);
> - if (!maxpages)
> - return -ENOMEM;
> - p = *pages;
> - for (k = 0; k < maxpages; k++)
> - p[k] = page + k;
> + if (k >= maxpages)
> + break;
> +
> + /*
> + * We are done when the end of the bvec doesn't align to a page
> + * boundary as that would create a hole in the returned space.
> + * The caller will handle this with another call to
> + * iov_iter_extract_pages.
> + */
> + if (bv.bv_offset + bv.bv_len != PAGE_SIZE)
> + break;
> +
> + bvec_iter_advance_single(i->bvec, &bi, bv.bv_len);
> + }
>
> - size = min_t(size_t, size, maxpages * PAGE_SIZE - offset);
> iov_iter_advance(i, size);
> return size;
> }
This is causing major network regression in UDP sendfile, found by syzbot.
I will release the syzbot report and this fix :
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 65ec660c2960..e19aab1fccca 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1728,6 +1728,10 @@ static ssize_t iov_iter_extract_bvec_pages(struct
iov_iter *i,
(*pages)[k++] = bv.bv_page;
size += bv.bv_len;
+ if (size > maxsize) {
+ size = maxsize;
+ break;
+ }
if (k >= maxpages)
break;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
2024-11-01 17:05 ` Eric Dumazet
@ 2024-11-01 18:00 ` Jens Axboe
0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2024-11-01 18:00 UTC (permalink / raw)
To: Eric Dumazet, Christoph Hellwig
Cc: akpm, viro, dhowells, linux-block, linux-kernel, ming.lei
On 11/1/24 11:05 AM, Eric Dumazet wrote:
>
> On 10/24/24 7:00 AM, Christoph Hellwig wrote:
>> From: Ming Lei <ming.lei@redhat.com>
>>
>> The iov_iter_extract_pages interface allows to return physically
>> discontiguous pages, as long as all but the first and last page
>> in the array are page aligned and page size. Rewrite
>> iov_iter_extract_bvec_pages to take advantage of that instead of only
>> returning ranges of physically contiguous pages.
>>
>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>> [hch: minor cleanups, new commit log]
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>> lib/iov_iter.c | 67 +++++++++++++++++++++++++++++++++-----------------
>> 1 file changed, 45 insertions(+), 22 deletions(-)
>>
>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>> index 1abb32c0da50..9fc06f5fb748 100644
>> --- a/lib/iov_iter.c
>> +++ b/lib/iov_iter.c
>> @@ -1677,8 +1677,8 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i,
>> }
>> /*
>> - * Extract a list of contiguous pages from an ITER_BVEC iterator. This does
>> - * not get references on the pages, nor does it get a pin on them.
>> + * Extract a list of virtually contiguous pages from an ITER_BVEC iterator.
>> + * This does not get references on the pages, nor does it get a pin on them.
>> */
>> static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
>> struct page ***pages, size_t maxsize,
>> @@ -1686,35 +1686,58 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
>> iov_iter_extraction_t extraction_flags,
>> size_t *offset0)
>> {
>> - struct page **p, *page;
>> - size_t skip = i->iov_offset, offset, size;
>> - int k;
>> + size_t skip = i->iov_offset, size = 0;
>> + struct bvec_iter bi;
>> + int k = 0;
>> - for (;;) {
>> - if (i->nr_segs == 0)
>> - return 0;
>> - size = min(maxsize, i->bvec->bv_len - skip);
>> - if (size)
>> - break;
>> + if (i->nr_segs == 0)
>> + return 0;
>> +
>> + if (i->iov_offset == i->bvec->bv_len) {
>> i->iov_offset = 0;
>> i->nr_segs--;
>> i->bvec++;
>> skip = 0;
>> }
>> + bi.bi_size = maxsize + skip;
>> + bi.bi_bvec_done = skip;
>> +
>> + maxpages = want_pages_array(pages, maxsize, skip, maxpages);
>> +
>> + while (bi.bi_size && bi.bi_idx < i->nr_segs) {
>> + struct bio_vec bv = bvec_iter_bvec(i->bvec, bi);
>> +
>> + /*
>> + * The iov_iter_extract_pages interface only allows an offset
>> + * into the first page. Break out of the loop if we see an
>> + * offset into subsequent pages, the caller will have to call
>> + * iov_iter_extract_pages again for the reminder.
>> + */
>> + if (k) {
>> + if (bv.bv_offset)
>> + break;
>> + } else {
>> + *offset0 = bv.bv_offset;
>> + }
>> - skip += i->bvec->bv_offset;
>> - page = i->bvec->bv_page + skip / PAGE_SIZE;
>> - offset = skip % PAGE_SIZE;
>> - *offset0 = offset;
>> + (*pages)[k++] = bv.bv_page;
>> + size += bv.bv_len;
>> - maxpages = want_pages_array(pages, size, offset, maxpages);
>> - if (!maxpages)
>> - return -ENOMEM;
>> - p = *pages;
>> - for (k = 0; k < maxpages; k++)
>> - p[k] = page + k;
>> + if (k >= maxpages)
>> + break;
>> +
>> + /*
>> + * We are done when the end of the bvec doesn't align to a page
>> + * boundary as that would create a hole in the returned space.
>> + * The caller will handle this with another call to
>> + * iov_iter_extract_pages.
>> + */
>> + if (bv.bv_offset + bv.bv_len != PAGE_SIZE)
>> + break;
>> +
>> + bvec_iter_advance_single(i->bvec, &bi, bv.bv_len);
>> + }
>> - size = min_t(size_t, size, maxpages * PAGE_SIZE - offset);
>> iov_iter_advance(i, size);
>> return size;
>> }
>
>
> This is causing major network regression in UDP sendfile, found by syzbot.
>
> I will release the syzbot report and this fix :
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 65ec660c2960..e19aab1fccca 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1728,6 +1728,10 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
> (*pages)[k++] = bv.bv_page;
> size += bv.bv_len;
>
> + if (size > maxsize) {
> + size = maxsize;
> + break;
> + }
> if (k >= maxpages)
> break;
Thanks Eric, I've applied your patch.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-11-01 18:00 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-24 5:00 [PATCH] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages Christoph Hellwig
2024-10-29 15:26 ` Jens Axboe
2024-10-30 17:56 ` Klara Modin
2024-10-31 0:14 ` Ming Lei
2024-10-31 0:22 ` Ming Lei
2024-10-31 8:42 ` Klara Modin
2024-10-31 11:17 ` Ming Lei
2024-11-01 17:05 ` Eric Dumazet
2024-11-01 18:00 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).