From: Sagi Grimberg <sagi@grimberg.me>
To: Christoph Hellwig <hch@lst.de>
Cc: Ofir Gal <ofir.gal@volumez.com>,
davem@davemloft.net, linux-block@vger.kernel.org,
linux-nvme@lists.infradead.org, netdev@vger.kernel.org,
ceph-devel@vger.kernel.org, dhowells@redhat.com,
edumazet@google.com, pabeni@redhat.com, kbusch@kernel.org,
axboe@kernel.dk, philipp.reisner@linbit.com,
lars.ellenberg@linbit.com, christoph.boehmwalder@linbit.com,
idryomov@gmail.com, xiubli@redhat.com
Subject: Re: [PATCH v2 1/4] net: introduce helper sendpages_ok()
Date: Tue, 4 Jun 2024 16:01:17 +0300 [thread overview]
Message-ID: <ef7ea4a8-c0e4-4fd9-9abb-42ae95090fc8@grimberg.me> (raw)
In-Reply-To: <62c2b8cd-ce6a-4e13-a58c-a6b30a0dcf17@grimberg.me>
On 04/06/2024 11:24, Sagi Grimberg wrote:
>
>
> On 04/06/2024 7:27, Christoph Hellwig wrote:
>> On Tue, Jun 04, 2024 at 12:27:06AM +0300, Sagi Grimberg wrote:
>>>>> I still don't understand how a page in the middle of a contiguous
>>>>> range ends
>>>>> up coming from the slab while others don't.
>>>> I haven't investigate the origin of the IO
>>>> yet. I suspect the first 2 pages are the superblocks of the raid
>>>> (mdp_superblock_1 and bitmap_super_s) and the rest of the IO is the
>>>> bitmap.
>>> Well, if these indeed are different origins and just *happen* to be a
>>> mixture
>>> of slab originated pages and non-slab pages combined together in a
>>> single
>>> bio of a bvec entry,
>>> I'd suspect that it would be more beneficial to split the bvec
>>> (essentially
>>> not allow bio_add_page
>>> to append the page to tail bvec depending on a queue limit (similar
>>> to how
>>> we handle sg gaps).
>> So you want to add a PageSlab check to bvec_try_merge_page? That sounds
>> fairly expensive..
>>
>
> The check needs to happen somewhere apparently, and given that it will
> be gated by a queue flag
> only request queues that actually needed will suffer, but they will
> suffer anyways...
Something like the untested patch below:
--
diff --git a/block/bio.c b/block/bio.c
index 53f608028c78..e55a4184c0e6 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -18,6 +18,7 @@
#include <linux/highmem.h>
#include <linux/blk-crypto.h>
#include <linux/xarray.h>
+#include <linux/net.h>
#include <trace/events/block.h>
#include "blk.h"
@@ -960,6 +961,9 @@ bool bvec_try_merge_hw_page(struct request_queue *q,
struct bio_vec *bv,
return false;
if (len > queue_max_segment_size(q) - bv->bv_len)
return false;
+ if (q->limits.splice_pages &&
+ sendpage_ok(bv->bv_page) ^ sendpage_ok(page))
+ return false;
return bvec_try_merge_page(bv, page, len, offset, same_page);
}
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a7e820840cf7..82e2719acb9c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1937,6 +1937,7 @@ static void nvme_set_ctrl_limits(struct nvme_ctrl
*ctrl,
lim->virt_boundary_mask = NVME_CTRL_PAGE_SIZE - 1;
lim->max_segment_size = UINT_MAX;
lim->dma_alignment = 3;
+ lim->splice_pages = ctrl->splice_pages;
}
static bool nvme_update_disk_info(struct nvme_ns *ns, struct
nvme_id_ns *id,
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 3f3e26849b61..d9818330e236 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -398,6 +398,7 @@ struct nvme_ctrl {
enum nvme_ctrl_type cntrltype;
enum nvme_dctype dctype;
+ bool splice_pages
};
static inline enum nvme_ctrl_state nvme_ctrl_state(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 02076b8cb4d8..618b8f20206a 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2146,6 +2146,12 @@ static int nvme_tcp_configure_admin_queue(struct
nvme_ctrl *ctrl, bool new)
if (error)
goto out_stop_queue;
+ /*
+ * we want to prevent contig pages with conflicting
splice-ability with
+ * respect to the network transmission
+ */
+ ctrl->splice_pages = true;
+
nvme_unquiesce_admin_queue(ctrl);
error = nvme_init_ctrl_finish(ctrl, false);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 69c4f113db42..ec657ddad2a4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -331,6 +331,12 @@ struct queue_limits {
* due to possible offsets.
*/
unsigned int dma_alignment;
+
+ /*
+ * Drivers that use MSG_SPLICE_PAGES to send the bvec over the
network,
+ * will need either bvec entry contig pages spliceable or not.
+ */
+ bool splice_pages;
};
typedef int (*report_zones_cb)(struct blk_zone *zone, unsigned int idx,
--
What I now see is that we will check PageSlab twice (bvec last index and
append page)
and skb_splice_from_iter checks it again... How many times check we
check this :)
Would be great if the network stack can just check it once and fallback
to page copy...
next prev parent reply other threads:[~2024-06-04 13:01 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-30 14:24 [PATCH v2 0/4] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Ofir Gal
2024-05-30 14:24 ` [PATCH v2 1/4] net: introduce helper sendpages_ok() Ofir Gal
2024-05-31 7:32 ` Christoph Hellwig
2024-05-31 8:51 ` Sagi Grimberg
2024-06-03 12:35 ` Ofir Gal
2024-06-03 21:27 ` Sagi Grimberg
2024-06-04 4:27 ` Christoph Hellwig
2024-06-04 8:24 ` Sagi Grimberg
2024-06-04 13:01 ` Sagi Grimberg [this message]
2024-06-06 12:57 ` Ofir Gal
2024-06-06 13:08 ` Christoph Hellwig
2024-06-06 13:18 ` Ofir Gal
2024-06-06 13:52 ` Christoph Hellwig
2024-06-06 15:42 ` Ofir Gal
2024-05-30 14:24 ` [PATCH v2 2/4] nvme-tcp: use sendpages_ok() instead of sendpage_ok() Ofir Gal
2024-05-31 7:32 ` Christoph Hellwig
2024-05-30 14:24 ` [PATCH v2 3/4] drbd: " Ofir Gal
2024-06-04 14:43 ` Christoph Böhmwalder
2024-05-30 14:24 ` [PATCH v2 4/4] libceph: " Ofir Gal
2024-05-31 7:32 ` [PATCH v2 0/4] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Christoph Hellwig
2024-06-01 22:36 ` Jakub Kicinski
2024-06-04 4:30 ` Christoph Hellwig
2024-06-04 14:42 ` Jakub Kicinski
2024-06-05 7:27 ` Christoph Hellwig
2024-06-01 22:34 ` Jakub Kicinski
2024-06-02 7:48 ` Sagi Grimberg
2024-06-03 9:07 ` Hannes Reinecke
2024-06-03 12:46 ` Ofir Gal
2024-06-03 7:24 ` Hannes Reinecke
2024-06-03 12:49 ` Ofir Gal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ef7ea4a8-c0e4-4fd9-9abb-42ae95090fc8@grimberg.me \
--to=sagi@grimberg.me \
--cc=axboe@kernel.dk \
--cc=ceph-devel@vger.kernel.org \
--cc=christoph.boehmwalder@linbit.com \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=hch@lst.de \
--cc=idryomov@gmail.com \
--cc=kbusch@kernel.org \
--cc=lars.ellenberg@linbit.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=netdev@vger.kernel.org \
--cc=ofir.gal@volumez.com \
--cc=pabeni@redhat.com \
--cc=philipp.reisner@linbit.com \
--cc=xiubli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox