[PATCH v6 00/11] remove msize limit in virtio transport

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v6 00/11] remove msize limit in virtio transport
@ 2022-07-15 21:35 Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 01/11] 9p/trans_virtio: separate allocation of scatter gather list Christian Schoenebeck
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:35 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

This series aims to get get rid of the current 500k 'msize' limitation in
the 9p virtio transport, which is currently a bottleneck for performance
of 9p mounts.

To avoid confusion: it does remove the msize limit for the virtio transport,
on 9p client level though the anticipated milestone for this series is now
a max. 'msize' of 4 MB. See patch 5 for reason why.

This is a follow-up of the following series and discussion:
https://lore.kernel.org/all/cover.1657636554.git.linux_oss@crudebyte.com/

Latest version of this series:
https://github.com/cschoenebeck/linux/commits/9p-virtio-drop-msize-cap


OVERVIEW OF PATCHES:

* Patches 1..6 remove the msize limitation from the 'virtio' transport
  (i.e. the 9p 'virtio' transport itself actually supports >4MB now, tested
  successfully with an experimental QEMU version and some dirty 9p Linux
  client hacks up to msize=128MB).

* Patches 7..11 tremendously reduce unnecessarily huge 9p message sizes and
  therefore provide performance gain as well. So far, almost all 9p messages
  simply allocated message buffers exactly msize large, even for messages
  that actually just needed few bytes. So these patches make sense by
  themselves, independent of this overall series, however for this series
  even more, because the larger msize, the more this issue would have hurt
  otherwise.


PREREQUISITES:

If you are testing with QEMU then please either use QEMU 6.2 or higher, or
at least apply the following patch on QEMU side:

  https://lore.kernel.org/qemu-devel/E1mT2Js-0000DW-OH@lizzy.crudebyte.com/

That QEMU patch is required if you are using a user space app that
automatically retrieves an optimum I/O block size by obeying stat's
st_blksize, which 'cat' for instance is doing, e.g.:

	time cat test_rnd.dat > /dev/null

Otherwise please use a user space app for performance testing that allows
you to force a large block size and to avoid that QEMU issue, like 'dd'
for instance, in that case you don't need to patch QEMU.


STILL TO DO:

  1. Negotiating virtio "Queue Indirect Size" (recommended):

    This series assumes a maximum amount of 1024 virtio indirect desriptors
    to be supported by 9p server, which is true for QEMU, but probably not
    for other 9p servers and therefore be addressed by negotiating the
    maximum length of virtio indirect descriptor tables on virtio device
    initialization instead. This would not only avoid issues with other 9p
    servers, but would also allow msize of >4MB in future. Before that
    change can be done on Linux and QEMU sides though, it first requires a
    change to the virtio spec. Work on that on the virtio spec. is still in
    progress:

    https://github.com/oasis-tcs/virtio-spec/issues/122

  2. Reduce readdir buffer sizes (optional - maybe later):

    This series already reduced the message buffers for most 9p message
    types. This does not include Treaddir though yet, which is still simply
    using msize. It would make sense to benchmark first whether this is
    actually an issue that hurts. If it does, then one might use already
    existing vfs knowledge to estimate the Treaddir size, or starting with
    some reasonable hard coded small Treaddir size first and then increasing
    it just on the 2nd Treaddir request if there are more directory entries
    to fetch.

  3. Add more buffer caches (optional - maybe later):

    p9_fcall_init() uses kmem_cache_alloc() instead of kmalloc() for very
    large buffers to reduce latency waiting for memory allocation to
    complete. Currently it does that only if the requested buffer size is
    exactly msize large. As patch 10 already divided the 9p message types
    into few message size categories, maybe it would make sense to use e.g.
    4 separate caches for those memory category (e.g. 4k, 8k, msize/2,
    msize). Might be worth a benchmark test.

Testing and feedback appreciated!

v5 -> v6:

  * Fix BUG_ON(node >= VIRTQUEUE_SG_NSGL_MAX) ->
        BUG_ON(node >= vq_sg->nsgl) in vq_sg_page().
    [patch 3]

  * Limit virtio transport to (slightly below) 4MB msize by assuming
    VIRTIO_MAX_DESCRIPTORS = 1024. [patch 5]

  * Drop previous patch "net/9p: limit 'msize' to KMALLOC_MAX_SIZE for all".
    [OLD patch 7]

  * Add static_assert(NAME_MAX <= 4k) to p9_msg_buf_size().
    [patch 9]

  * Make it more clear from the commit log how individual 9p message types
    were distributed over the 2 hard coded message size categories.
    [patch 9]

  * Drop max() call in p9_msg_buf_size() which was superfluous.
    [patch 9]

  * Compare protocol version directly against 'p9_proto_legacy' instead of
    inverted comparison against 'p9_proto_2000u' and 'p9_proto_2000l'.
    [patch 9]

  * Fix code style aesthetics by getting rid of an unnecessary block
    indention. [patch 9]

  * Dynamically calculate size for P9_TSYMLINK instead of using a hard
    coded size. [patch 9]

  * Fix s/P9_TSTAT/P9_TWSTAT/.
    [patch 9]

  * Add missing P9_TMKNOD, P9_TRENAME, P9_TLOCK to hard coded 8k size
    category. [patch 9]

  * Use a dedicated new bool field 'pooled_rbuffers' in struct
    p9_trans_module instead of strcmp(c->trans_mod->name, "rdma").
    [NEW patch 10], [patch 11]

  * Allow dynamic request buffer size for RDMA transport as well.
    [patch 11]

Christian Schoenebeck (11):
  9p/trans_virtio: separate allocation of scatter gather list
  9p/trans_virtio: turn amount of sg lists into runtime info
  9p/trans_virtio: introduce struct virtqueue_sg
  net/9p: add trans_maxsize to struct p9_client
  9p/trans_virtio: support larger msize values
  9p/trans_virtio: resize sg lists to whatever is possible
  net/9p: split message size argument into 't_size' and 'r_size' pair
  9p: add P9_ERRMAX for 9p2000 and 9p2000.u
  net/9p: add p9_msg_buf_size()
  net/9p: add 'pooled_rbuffers' flag to struct p9_trans_module
  net/9p: allocate appropriate reduced message buffers

 include/net/9p/9p.h        |   3 +
 include/net/9p/client.h    |   2 +
 include/net/9p/transport.h |   5 +
 net/9p/client.c            |  60 +++++--
 net/9p/protocol.c          | 167 +++++++++++++++++++
 net/9p/protocol.h          |   2 +
 net/9p/trans_fd.c          |   1 +
 net/9p/trans_rdma.c        |   1 +
 net/9p/trans_virtio.c      | 321 ++++++++++++++++++++++++++++++++-----
 net/9p/trans_xen.c         |   1 +
 10 files changed, 514 insertions(+), 49 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v6 01/11] 9p/trans_virtio: separate allocation of scatter gather list
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
@ 2022-07-15 21:32 ` Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 02/11] 9p/trans_virtio: turn amount of sg lists into runtime info Christian Schoenebeck
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:32 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

The scatter gather list in struct virtio_chan currently
resides as compile-time constant size array directly within the
contiguous struct virtio_chan's memory space.

Separate memory space and allocation of the scatter gather list
from memory space and allocation of struct virtio_chan.

Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---
 net/9p/trans_virtio.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index b24a4fb0f0a2..2693e618080c 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -77,7 +77,7 @@ struct virtio_chan {
 	 */
 	unsigned long p9_max_pages;
 	/* Scatterlist: can be too big for stack. */
-	struct scatterlist sg[VIRTQUEUE_NUM];
+	struct scatterlist *sg;
 	/**
 	 * @tag: name to identify a mount null terminated
 	 */
@@ -574,6 +574,14 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 		goto fail;
 	}
 
+	chan->sg = kmalloc_array(VIRTQUEUE_NUM,
+				 sizeof(struct scatterlist), GFP_KERNEL);
+	if (!chan->sg) {
+		pr_err("Failed to allocate virtio 9P channel\n");
+		err = -ENOMEM;
+		goto out_free_chan_shallow;
+	}
+
 	chan->vdev = vdev;
 
 	/* We expect one virtqueue, for requests. */
@@ -635,6 +643,8 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 out_free_vq:
 	vdev->config->del_vqs(vdev);
 out_free_chan:
+	kfree(chan->sg);
+out_free_chan_shallow:
 	kfree(chan);
 fail:
 	return err;
@@ -728,6 +738,7 @@ static void p9_virtio_remove(struct virtio_device *vdev)
 	kobject_uevent(&(vdev->dev.kobj), KOBJ_CHANGE);
 	kfree(chan->tag);
 	kfree(chan->vc_wq);
+	kfree(chan->sg);
 	kfree(chan);
 
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 02/11] 9p/trans_virtio: turn amount of sg lists into runtime info
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 01/11] 9p/trans_virtio: separate allocation of scatter gather list Christian Schoenebeck
@ 2022-07-15 21:32 ` Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 03/11] 9p/trans_virtio: introduce struct virtqueue_sg Christian Schoenebeck
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:32 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

The size of scatter/gather lists used by the virtio transport is
currently hard coded. Refactor this to using a runtime variable.

Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---
 net/9p/trans_virtio.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 2693e618080c..18bdfa64b934 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -36,7 +36,7 @@
 #include <linux/virtio_9p.h>
 #include "trans_common.h"
 
-#define VIRTQUEUE_NUM	128
+#define VIRTQUEUE_DEFAULT_NUM	128
 
 /* a single mutex to manage channel initialization and attachment */
 static DEFINE_MUTEX(virtio_9p_lock);
@@ -54,6 +54,7 @@ static atomic_t vp_pinned = ATOMIC_INIT(0);
  * @vc_wq: wait queue for waiting for thing to be added to ring buf
  * @p9_max_pages: maximum number of pinned pages
  * @sg: scatter gather list which is used to pack a request (protected?)
+ * @sg_n: amount of elements in sg array
  * @chan_list: linked list of channels
  *
  * We keep all per-channel information in a structure.
@@ -78,6 +79,7 @@ struct virtio_chan {
 	unsigned long p9_max_pages;
 	/* Scatterlist: can be too big for stack. */
 	struct scatterlist *sg;
+	size_t sg_n;
 	/**
 	 * @tag: name to identify a mount null terminated
 	 */
@@ -270,12 +272,12 @@ p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
 	out_sgs = in_sgs = 0;
 	/* Handle out VirtIO ring buffers */
 	out = pack_sg_list(chan->sg, 0,
-			   VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);
+			   chan->sg_n, req->tc.sdata, req->tc.size);
 	if (out)
 		sgs[out_sgs++] = chan->sg;
 
 	in = pack_sg_list(chan->sg, out,
-			  VIRTQUEUE_NUM, req->rc.sdata, req->rc.capacity);
+			  chan->sg_n, req->rc.sdata, req->rc.capacity);
 	if (in)
 		sgs[out_sgs + in_sgs++] = chan->sg + out;
 
@@ -447,14 +449,14 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
 
 	/* out data */
 	out = pack_sg_list(chan->sg, 0,
-			   VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);
+			   chan->sg_n, req->tc.sdata, req->tc.size);
 
 	if (out)
 		sgs[out_sgs++] = chan->sg;
 
 	if (out_pages) {
 		sgs[out_sgs++] = chan->sg + out;
-		out += pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
+		out += pack_sg_list_p(chan->sg, out, chan->sg_n,
 				      out_pages, out_nr_pages, offs, outlen);
 	}
 
@@ -466,13 +468,13 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
 	 * allocated memory and payload onto the user buffer.
 	 */
 	in = pack_sg_list(chan->sg, out,
-			  VIRTQUEUE_NUM, req->rc.sdata, in_hdr_len);
+			  chan->sg_n, req->rc.sdata, in_hdr_len);
 	if (in)
 		sgs[out_sgs + in_sgs++] = chan->sg + out;
 
 	if (in_pages) {
 		sgs[out_sgs + in_sgs++] = chan->sg + out + in;
-		in += pack_sg_list_p(chan->sg, out + in, VIRTQUEUE_NUM,
+		in += pack_sg_list_p(chan->sg, out + in, chan->sg_n,
 				     in_pages, in_nr_pages, offs, inlen);
 	}
 
@@ -574,13 +576,14 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 		goto fail;
 	}
 
-	chan->sg = kmalloc_array(VIRTQUEUE_NUM,
+	chan->sg = kmalloc_array(VIRTQUEUE_DEFAULT_NUM,
 				 sizeof(struct scatterlist), GFP_KERNEL);
 	if (!chan->sg) {
 		pr_err("Failed to allocate virtio 9P channel\n");
 		err = -ENOMEM;
 		goto out_free_chan_shallow;
 	}
+	chan->sg_n = VIRTQUEUE_DEFAULT_NUM;
 
 	chan->vdev = vdev;
 
@@ -593,7 +596,7 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 	chan->vq->vdev->priv = chan;
 	spin_lock_init(&chan->lock);
 
-	sg_init_table(chan->sg, VIRTQUEUE_NUM);
+	sg_init_table(chan->sg, chan->sg_n);
 
 	chan->inuse = false;
 	if (virtio_has_feature(vdev, VIRTIO_9P_MOUNT_TAG)) {
@@ -777,7 +780,7 @@ static struct p9_trans_module p9_virtio_trans = {
 	 * that are not at page boundary, that can result in an extra
 	 * page in zero copy.
 	 */
-	.maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
+	.maxsize = PAGE_SIZE * (VIRTQUEUE_DEFAULT_NUM - 3),
 	.def = 1,
 	.owner = THIS_MODULE,
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 03/11] 9p/trans_virtio: introduce struct virtqueue_sg
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 01/11] 9p/trans_virtio: separate allocation of scatter gather list Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 02/11] 9p/trans_virtio: turn amount of sg lists into runtime info Christian Schoenebeck
@ 2022-07-15 21:32 ` Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 04/11] net/9p: add trans_maxsize to struct p9_client Christian Schoenebeck
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:32 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

The amount of elements in a scatter/gather list is limited to
approximately 128 elements. To allow going beyond that limit
with subsequent patches, pave the way by turning the one-
dimensional sg list array into a two-dimensional array, i.e:

  sg[128]

becomes

  sgl[nsgl][SG_MAX_SINGLE_ALLOC]

As the value of 'nsgl' is exactly (still) 1 in this commit
and the compile-time (compiler and architecture dependent)
value of 'SG_MAX_SINGLE_ALLOC' equals approximately the
previous hard coded 128 elements, this commit is therefore
more of a preparatory refactoring then actual behaviour
change.

A custom struct virtqueue_sg is defined instead of using
shared API struct sg_table, because the latter would not
allow to resize the table after allocation. sg_append_table
API OTOH would not fit either, because it requires a list
of pages beforehand upon allocation. And both APIs only
support all-or-nothing allocation.

Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---
 net/9p/trans_virtio.c | 193 ++++++++++++++++++++++++++++++++----------
 1 file changed, 147 insertions(+), 46 deletions(-)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 18bdfa64b934..5ac533f83322 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -36,7 +36,31 @@
 #include <linux/virtio_9p.h>
 #include "trans_common.h"
 
-#define VIRTQUEUE_DEFAULT_NUM	128
+/**
+ * struct virtqueue_sg - (chained) scatter gather lists for virtqueue data
+ * transmission
+ * @nsgl: amount of elements (in first dimension) of array field @sgl
+ * @sgl: two-dimensional array, i.e. sgl[nsgl][SG_MAX_SINGLE_ALLOC]
+ */
+struct virtqueue_sg {
+	unsigned int nsgl;
+	struct scatterlist *sgl[];
+};
+
+/*
+ * Default value for field nsgl in struct virtqueue_sg, which defines the
+ * initial virtio data transmission capacity when this virtio transport is
+ * probed.
+ */
+#define VIRTQUEUE_SG_NSGL_DEFAULT 1
+
+/* maximum value for field nsgl in struct virtqueue_sg */
+#define VIRTQUEUE_SG_NSGL_MAX						\
+	((PAGE_SIZE - sizeof(struct virtqueue_sg)) /			\
+	sizeof(struct scatterlist *))					\
+
+/* last entry per sg list is used for chaining (pointer to next list) */
+#define SG_USER_PAGES_PER_LIST	(SG_MAX_SINGLE_ALLOC - 1)
 
 /* a single mutex to manage channel initialization and attachment */
 static DEFINE_MUTEX(virtio_9p_lock);
@@ -53,8 +77,7 @@ static atomic_t vp_pinned = ATOMIC_INIT(0);
  * @ring_bufs_avail: flag to indicate there is some available in the ring buf
  * @vc_wq: wait queue for waiting for thing to be added to ring buf
  * @p9_max_pages: maximum number of pinned pages
- * @sg: scatter gather list which is used to pack a request (protected?)
- * @sg_n: amount of elements in sg array
+ * @vq_sg: table of scatter gather lists, which are used to pack a request
  * @chan_list: linked list of channels
  *
  * We keep all per-channel information in a structure.
@@ -77,9 +100,7 @@ struct virtio_chan {
 	 * will be placing it in each channel.
 	 */
 	unsigned long p9_max_pages;
-	/* Scatterlist: can be too big for stack. */
-	struct scatterlist *sg;
-	size_t sg_n;
+	struct virtqueue_sg *vq_sg;
 	/**
 	 * @tag: name to identify a mount null terminated
 	 */
@@ -96,6 +117,92 @@ static unsigned int rest_of_page(void *data)
 	return PAGE_SIZE - offset_in_page(data);
 }
 
+/**
+ * vq_sg_page - returns user page for given page index
+ * @vq_sg: scatter gather lists used by this transport
+ * @page: user page index across all scatter gather lists
+ */
+static struct scatterlist *vq_sg_page(struct virtqueue_sg *vq_sg, size_t page)
+{
+	unsigned int node = page / SG_USER_PAGES_PER_LIST;
+	unsigned int leaf = page % SG_USER_PAGES_PER_LIST;
+	BUG_ON(node >= vq_sg->nsgl);
+	return &vq_sg->sgl[node][leaf];
+}
+
+/**
+ * vq_sg_npages - returns total number of individual user pages in passed
+ * scatter gather lists
+ * @vq_sg: scatter gather lists to be counted
+ */
+static size_t vq_sg_npages(struct virtqueue_sg *vq_sg)
+{
+	return vq_sg->nsgl * SG_USER_PAGES_PER_LIST;
+}
+
+/**
+ * vq_sg_free - free all memory previously allocated for @vq_sg
+ * @vq_sg: scatter gather lists to be freed
+ */
+static void vq_sg_free(struct virtqueue_sg *vq_sg)
+{
+	unsigned int i;
+
+	if (!vq_sg)
+		return;
+
+	for (i = 0; i < vq_sg->nsgl; ++i) {
+		kfree(vq_sg->sgl[i]);
+	}
+	kfree(vq_sg);
+}
+
+/**
+ * vq_sg_alloc - allocates and returns @nsgl scatter gather lists
+ * @nsgl: amount of scatter gather lists to be allocated
+ * If @nsgl is larger than one then chained lists are used if supported by
+ * architecture.
+ */
+static struct virtqueue_sg *vq_sg_alloc(unsigned int nsgl)
+{
+	struct virtqueue_sg *vq_sg;
+	unsigned int i;
+
+	BUG_ON(!nsgl || nsgl > VIRTQUEUE_SG_NSGL_MAX);
+#ifdef CONFIG_ARCH_NO_SG_CHAIN
+	if (WARN_ON_ONCE(nsgl > 1))
+		return NULL;
+#endif
+
+	vq_sg = kzalloc(sizeof(struct virtqueue_sg) +
+			nsgl * sizeof(struct scatterlist *),
+			GFP_KERNEL);
+
+	if (!vq_sg)
+		return NULL;
+
+	vq_sg->nsgl = nsgl;
+
+	for (i = 0; i < nsgl; ++i) {
+		vq_sg->sgl[i] = kmalloc_array(
+			SG_MAX_SINGLE_ALLOC, sizeof(struct scatterlist),
+			GFP_KERNEL
+		);
+		if (!vq_sg->sgl[i]) {
+			vq_sg_free(vq_sg);
+			return NULL;
+		}
+		sg_init_table(vq_sg->sgl[i], SG_MAX_SINGLE_ALLOC);
+		if (i) {
+			/* chain the lists */
+			sg_chain(vq_sg->sgl[i - 1], SG_MAX_SINGLE_ALLOC,
+				 vq_sg->sgl[i]);
+		}
+	}
+	sg_mark_end(&vq_sg->sgl[nsgl - 1][SG_MAX_SINGLE_ALLOC - 1]);
+	return vq_sg;
+}
+
 /**
  * p9_virtio_close - reclaim resources of a channel
  * @client: client instance
@@ -158,9 +265,8 @@ static void req_done(struct virtqueue *vq)
 
 /**
  * pack_sg_list - pack a scatter gather list from a linear buffer
- * @sg: scatter/gather list to pack into
+ * @vq_sg: scatter/gather lists to pack into
  * @start: which segment of the sg_list to start at
- * @limit: maximum segment to pack data to
  * @data: data to pack into scatter/gather list
  * @count: amount of data to pack into the scatter/gather list
  *
@@ -170,11 +276,12 @@ static void req_done(struct virtqueue *vq)
  *
  */
 
-static int pack_sg_list(struct scatterlist *sg, int start,
-			int limit, char *data, int count)
+static int pack_sg_list(struct virtqueue_sg *vq_sg, int start,
+			char *data, int count)
 {
 	int s;
 	int index = start;
+	size_t limit = vq_sg_npages(vq_sg);
 
 	while (count) {
 		s = rest_of_page(data);
@@ -182,13 +289,13 @@ static int pack_sg_list(struct scatterlist *sg, int start,
 			s = count;
 		BUG_ON(index >= limit);
 		/* Make sure we don't terminate early. */
-		sg_unmark_end(&sg[index]);
-		sg_set_buf(&sg[index++], data, s);
+		sg_unmark_end(vq_sg_page(vq_sg, index));
+		sg_set_buf(vq_sg_page(vq_sg, index++), data, s);
 		count -= s;
 		data += s;
 	}
 	if (index-start)
-		sg_mark_end(&sg[index - 1]);
+		sg_mark_end(vq_sg_page(vq_sg, index - 1));
 	return index-start;
 }
 
@@ -208,21 +315,21 @@ static int p9_virtio_cancelled(struct p9_client *client, struct p9_req_t *req)
 /**
  * pack_sg_list_p - Just like pack_sg_list. Instead of taking a buffer,
  * this takes a list of pages.
- * @sg: scatter/gather list to pack into
+ * @vq_sg: scatter/gather lists to pack into
  * @start: which segment of the sg_list to start at
- * @limit: maximum number of pages in sg list.
  * @pdata: a list of pages to add into sg.
  * @nr_pages: number of pages to pack into the scatter/gather list
  * @offs: amount of data in the beginning of first page _not_ to pack
  * @count: amount of data to pack into the scatter/gather list
  */
 static int
-pack_sg_list_p(struct scatterlist *sg, int start, int limit,
+pack_sg_list_p(struct virtqueue_sg *vq_sg, int start,
 	       struct page **pdata, int nr_pages, size_t offs, int count)
 {
 	int i = 0, s;
 	int data_off = offs;
 	int index = start;
+	size_t limit = vq_sg_npages(vq_sg);
 
 	BUG_ON(nr_pages > (limit - start));
 	/*
@@ -235,15 +342,16 @@ pack_sg_list_p(struct scatterlist *sg, int start, int limit,
 			s = count;
 		BUG_ON(index >= limit);
 		/* Make sure we don't terminate early. */
-		sg_unmark_end(&sg[index]);
-		sg_set_page(&sg[index++], pdata[i++], s, data_off);
+		sg_unmark_end(vq_sg_page(vq_sg, index));
+		sg_set_page(vq_sg_page(vq_sg, index++), pdata[i++], s,
+			    data_off);
 		data_off = 0;
 		count -= s;
 		nr_pages--;
 	}
 
 	if (index-start)
-		sg_mark_end(&sg[index - 1]);
+		sg_mark_end(vq_sg_page(vq_sg, index - 1));
 	return index - start;
 }
 
@@ -271,15 +379,13 @@ p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
 
 	out_sgs = in_sgs = 0;
 	/* Handle out VirtIO ring buffers */
-	out = pack_sg_list(chan->sg, 0,
-			   chan->sg_n, req->tc.sdata, req->tc.size);
+	out = pack_sg_list(chan->vq_sg, 0, req->tc.sdata, req->tc.size);
 	if (out)
-		sgs[out_sgs++] = chan->sg;
+		sgs[out_sgs++] = vq_sg_page(chan->vq_sg, 0);
 
-	in = pack_sg_list(chan->sg, out,
-			  chan->sg_n, req->rc.sdata, req->rc.capacity);
+	in = pack_sg_list(chan->vq_sg, out, req->rc.sdata, req->rc.capacity);
 	if (in)
-		sgs[out_sgs + in_sgs++] = chan->sg + out;
+		sgs[out_sgs + in_sgs++] = vq_sg_page(chan->vq_sg, out);
 
 	err = virtqueue_add_sgs(chan->vq, sgs, out_sgs, in_sgs, req,
 				GFP_ATOMIC);
@@ -448,16 +554,15 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
 	out_sgs = in_sgs = 0;
 
 	/* out data */
-	out = pack_sg_list(chan->sg, 0,
-			   chan->sg_n, req->tc.sdata, req->tc.size);
+	out = pack_sg_list(chan->vq_sg, 0, req->tc.sdata, req->tc.size);
 
 	if (out)
-		sgs[out_sgs++] = chan->sg;
+		sgs[out_sgs++] = vq_sg_page(chan->vq_sg, 0);
 
 	if (out_pages) {
-		sgs[out_sgs++] = chan->sg + out;
-		out += pack_sg_list_p(chan->sg, out, chan->sg_n,
-				      out_pages, out_nr_pages, offs, outlen);
+		sgs[out_sgs++] = vq_sg_page(chan->vq_sg, out);
+		out += pack_sg_list_p(chan->vq_sg, out, out_pages,
+				      out_nr_pages, offs, outlen);
 	}
 
 	/*
@@ -467,15 +572,14 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
 	 * Arrange in such a way that server places header in the
 	 * allocated memory and payload onto the user buffer.
 	 */
-	in = pack_sg_list(chan->sg, out,
-			  chan->sg_n, req->rc.sdata, in_hdr_len);
+	in = pack_sg_list(chan->vq_sg, out, req->rc.sdata, in_hdr_len);
 	if (in)
-		sgs[out_sgs + in_sgs++] = chan->sg + out;
+		sgs[out_sgs + in_sgs++] = vq_sg_page(chan->vq_sg, out);
 
 	if (in_pages) {
-		sgs[out_sgs + in_sgs++] = chan->sg + out + in;
-		in += pack_sg_list_p(chan->sg, out + in, chan->sg_n,
-				     in_pages, in_nr_pages, offs, inlen);
+		sgs[out_sgs + in_sgs++] = vq_sg_page(chan->vq_sg, out + in);
+		in += pack_sg_list_p(chan->vq_sg, out + in, in_pages,
+				     in_nr_pages, offs, inlen);
 	}
 
 	BUG_ON(out_sgs + in_sgs > ARRAY_SIZE(sgs));
@@ -576,14 +680,12 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 		goto fail;
 	}
 
-	chan->sg = kmalloc_array(VIRTQUEUE_DEFAULT_NUM,
-				 sizeof(struct scatterlist), GFP_KERNEL);
-	if (!chan->sg) {
+	chan->vq_sg = vq_sg_alloc(VIRTQUEUE_SG_NSGL_DEFAULT);
+	if (!chan->vq_sg) {
 		pr_err("Failed to allocate virtio 9P channel\n");
 		err = -ENOMEM;
 		goto out_free_chan_shallow;
 	}
-	chan->sg_n = VIRTQUEUE_DEFAULT_NUM;
 
 	chan->vdev = vdev;
 
@@ -596,8 +698,6 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 	chan->vq->vdev->priv = chan;
 	spin_lock_init(&chan->lock);
 
-	sg_init_table(chan->sg, chan->sg_n);
-
 	chan->inuse = false;
 	if (virtio_has_feature(vdev, VIRTIO_9P_MOUNT_TAG)) {
 		virtio_cread(vdev, struct virtio_9p_config, tag_len, &tag_len);
@@ -646,7 +746,7 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 out_free_vq:
 	vdev->config->del_vqs(vdev);
 out_free_chan:
-	kfree(chan->sg);
+	vq_sg_free(chan->vq_sg);
 out_free_chan_shallow:
 	kfree(chan);
 fail:
@@ -741,7 +841,7 @@ static void p9_virtio_remove(struct virtio_device *vdev)
 	kobject_uevent(&(vdev->dev.kobj), KOBJ_CHANGE);
 	kfree(chan->tag);
 	kfree(chan->vc_wq);
-	kfree(chan->sg);
+	vq_sg_free(chan->vq_sg);
 	kfree(chan);
 
 }
@@ -780,7 +880,8 @@ static struct p9_trans_module p9_virtio_trans = {
 	 * that are not at page boundary, that can result in an extra
 	 * page in zero copy.
 	 */
-	.maxsize = PAGE_SIZE * (VIRTQUEUE_DEFAULT_NUM - 3),
+	.maxsize = PAGE_SIZE *
+		((VIRTQUEUE_SG_NSGL_DEFAULT * SG_USER_PAGES_PER_LIST) - 3),
 	.def = 1,
 	.owner = THIS_MODULE,
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 04/11] net/9p: add trans_maxsize to struct p9_client
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
                   ` (2 preceding siblings ...)
  2022-07-15 21:32 ` [PATCH v6 03/11] 9p/trans_virtio: introduce struct virtqueue_sg Christian Schoenebeck
@ 2022-07-15 21:32 ` Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 05/11] 9p/trans_virtio: support larger msize values Christian Schoenebeck
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:32 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

This new field 'trans_maxsize' optionally allows transport to
update it to reflect the actual maximum msize supported by
allocated transport channel.

Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---
 include/net/9p/client.h |  2 ++
 net/9p/client.c         | 12 ++++++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index ec1d1706f43c..f5718057fca4 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -87,6 +87,7 @@ struct p9_req_t {
  * struct p9_client - per client instance state
  * @lock: protect @fids and @reqs
  * @msize: maximum data size negotiated by protocol
+ * @trans_maxsize: actual maximum msize supported by transport channel
  * @proto_version: 9P protocol version to use
  * @trans_mod: module API instantiated with this client
  * @status: connection state
@@ -101,6 +102,7 @@ struct p9_req_t {
 struct p9_client {
 	spinlock_t lock;
 	unsigned int msize;
+	unsigned int trans_maxsize;
 	unsigned char proto_version;
 	struct p9_trans_module *trans_mod;
 	enum p9_trans_status status;
diff --git a/net/9p/client.c b/net/9p/client.c
index 8bba0d9cf975..20054addd81b 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1031,6 +1031,14 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
 		goto free_client;
 	}
 
+	/*
+	 * transport will get a chance to increase trans_maxsize (if
+	 * necessary) and it may update trans_maxsize in create() function
+	 * below accordingly to reflect the actual maximum size supported by
+	 * the allocated transport channel
+	 */
+	clnt->trans_maxsize = clnt->trans_mod->maxsize;
+
 	p9_debug(P9_DEBUG_MUX, "clnt %p trans %p msize %d protocol %d\n",
 		 clnt, clnt->trans_mod, clnt->msize, clnt->proto_version);
 
@@ -1038,8 +1046,8 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
 	if (err)
 		goto put_trans;
 
-	if (clnt->msize > clnt->trans_mod->maxsize) {
-		clnt->msize = clnt->trans_mod->maxsize;
+	if (clnt->msize > clnt->trans_maxsize) {
+		clnt->msize = clnt->trans_maxsize;
 		pr_info("Limiting 'msize' to %d as this is the maximum "
 			"supported by transport %s\n",
 			clnt->msize, clnt->trans_mod->name
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 05/11] 9p/trans_virtio: support larger msize values
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
                   ` (3 preceding siblings ...)
  2022-07-15 21:32 ` [PATCH v6 04/11] net/9p: add trans_maxsize to struct p9_client Christian Schoenebeck
@ 2022-07-15 21:32 ` Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 06/11] 9p/trans_virtio: resize sg lists to whatever is possible Christian Schoenebeck
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:32 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

The virtio transport supports by default a 9p 'msize' of up to
approximately 500 kB. This patch adds support for larger 'msize'
values by resizing the amount of scatter/gather lists if required.

To be more precise, for the moment this patch increases the 'msize'
limit for the virtio transport to slightly below 4 MB, virtio
transport actually supports much more (tested successfully with an
experimental QEMU version and some dirty 9p Linux client hacks up
to msize=128MB), but client still uses linear buffers, which in
turn are limited to KMALLOC_MAX_SIZE (4M).

Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---

I am not sure if it is safe the way SG lists are resized here. I "think"
Dominique said before there should be no concurrency here, but probably
deserves a revisit.

 net/9p/trans_virtio.c | 79 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 5ac533f83322..921caa022570 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -36,6 +36,16 @@
 #include <linux/virtio_9p.h>
 #include "trans_common.h"
 
+/*
+ * Maximum amount of virtio descriptors allowed per virtio round-trip
+ * message.
+ *
+ * This effectively limits msize to (slightly below) 4M, virtio transport
+ * actually supports much more, but client still uses linear buffers, which
+ * in turn are limited to KMALLOC_MAX_SIZE (4M).
+ */
+#define VIRTIO_MAX_DESCRIPTORS 1024
+
 /**
  * struct virtqueue_sg - (chained) scatter gather lists for virtqueue data
  * transmission
@@ -203,6 +213,31 @@ static struct virtqueue_sg *vq_sg_alloc(unsigned int nsgl)
 	return vq_sg;
 }
 
+/**
+ * vq_sg_resize - resize passed virtqueue scatter/gather lists to the passed
+ * amount of lists
+ * @_vq_sg: scatter/gather lists to be resized
+ * @nsgl: new amount of scatter/gather lists
+ */
+static int vq_sg_resize(struct virtqueue_sg **_vq_sg, unsigned int nsgl)
+{
+	struct virtqueue_sg *vq_sg;
+
+	BUG_ON(!_vq_sg || !nsgl);
+	vq_sg = *_vq_sg;
+	if (vq_sg->nsgl == nsgl)
+		return 0;
+
+	/* lazy resize implementation for now */
+	vq_sg = vq_sg_alloc(nsgl);
+	if (!vq_sg)
+		return -ENOMEM;
+
+	kfree(*_vq_sg);
+	*_vq_sg = vq_sg;
+	return 0;
+}
+
 /**
  * p9_virtio_close - reclaim resources of a channel
  * @client: client instance
@@ -774,6 +809,10 @@ p9_virtio_create(struct p9_client *client, const char *devname, char *args)
 	struct virtio_chan *chan;
 	int ret = -ENOENT;
 	int found = 0;
+#if !defined(CONFIG_ARCH_NO_SG_CHAIN)
+	size_t npages;
+	size_t nsgl;
+#endif
 
 	if (devname == NULL)
 		return -EINVAL;
@@ -796,6 +835,46 @@ p9_virtio_create(struct p9_client *client, const char *devname, char *args)
 		return ret;
 	}
 
+	/*
+	 * if user supplied an 'msize' option that's larger than what this
+	 * transport supports by default, then try to allocate more sg lists
+	 */
+	if (client->msize > client->trans_maxsize) {
+#ifdef CONFIG_ARCH_NO_SG_CHAIN
+		pr_info("limiting 'msize' to %d because architecture does not "
+			"support chained scatter gather lists\n",
+			client->trans_maxsize);
+#else
+		npages = DIV_ROUND_UP(client->msize, PAGE_SIZE);
+		if (npages > VIRTIO_MAX_DESCRIPTORS)
+			npages = VIRTIO_MAX_DESCRIPTORS;
+		if (npages > chan->p9_max_pages) {
+			npages = chan->p9_max_pages;
+			pr_info("limiting 'msize' as it would exceed the max. "
+				"of %lu pages allowed on this system\n",
+				chan->p9_max_pages);
+		}
+		nsgl = DIV_ROUND_UP(npages, SG_USER_PAGES_PER_LIST);
+		if (nsgl > chan->vq_sg->nsgl) {
+			/*
+			 * if resize fails, no big deal, then just continue with
+			 * whatever we got
+			 */
+			if (!vq_sg_resize(&chan->vq_sg, nsgl)) {
+				/*
+				 * decrement 2 pages as both 9p request and 9p reply have
+				 * to fit into the virtio round-trip message
+				 */
+				client->trans_maxsize =
+					PAGE_SIZE *
+					clamp_t(int,
+						(nsgl * SG_USER_PAGES_PER_LIST) - 2,
+						0, VIRTIO_MAX_DESCRIPTORS - 2);
+			}
+		}
+#endif /* CONFIG_ARCH_NO_SG_CHAIN */
+	}
+
 	client->trans = (void *)chan;
 	client->status = Connected;
 	chan->client = client;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 06/11] 9p/trans_virtio: resize sg lists to whatever is possible
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
                   ` (4 preceding siblings ...)
  2022-07-15 21:32 ` [PATCH v6 05/11] 9p/trans_virtio: support larger msize values Christian Schoenebeck
@ 2022-07-15 21:32 ` Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 07/11] net/9p: split message size argument into 't_size' and 'r_size' pair Christian Schoenebeck
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:32 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

Right now vq_sg_resize() used a lazy implementation following
the all-or-nothing princible. So it either resized exactly to
the requested new amount of sg lists, or it did not resize at
all.

The problem with this is if a user supplies a very large msize
value, resize would simply fail and the user would stick to
the default maximum msize supported by the virtio transport.

To resolve this potential issue, change vq_sg_resize() to resize
the passed sg list to whatever is possible on the machine.

Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---

As previously discussed in v5, this patch could probably be dropped.

 net/9p/trans_virtio.c | 76 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 13 deletions(-)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 921caa022570..52d00cb3c105 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -218,24 +218,67 @@ static struct virtqueue_sg *vq_sg_alloc(unsigned int nsgl)
  * amount of lists
  * @_vq_sg: scatter/gather lists to be resized
  * @nsgl: new amount of scatter/gather lists
+ *
+ * Old scatter/gather lists are retained. Only growing the size is supported.
+ * If the requested amount cannot be satisfied, then lists are increased to
+ * whatever is possible.
  */
 static int vq_sg_resize(struct virtqueue_sg **_vq_sg, unsigned int nsgl)
 {
 	struct virtqueue_sg *vq_sg;
+	unsigned int i;
+	size_t sz;
+	int ret = 0;
 
 	BUG_ON(!_vq_sg || !nsgl);
 	vq_sg = *_vq_sg;
+	if (nsgl > VIRTQUEUE_SG_NSGL_MAX)
+		nsgl = VIRTQUEUE_SG_NSGL_MAX;
 	if (vq_sg->nsgl == nsgl)
 		return 0;
+	if (vq_sg->nsgl > nsgl)
+		return -ENOTSUPP;
+
+	vq_sg = kzalloc(sizeof(struct virtqueue_sg) +
+			nsgl * sizeof(struct scatterlist *),
+			GFP_KERNEL);
 
-	/* lazy resize implementation for now */
-	vq_sg = vq_sg_alloc(nsgl);
 	if (!vq_sg)
 		return -ENOMEM;
 
+	/* copy over old scatter gather lists */
+	sz = sizeof(struct virtqueue_sg) +
+		(*_vq_sg)->nsgl * sizeof(struct scatterlist *);
+	memcpy(vq_sg, *_vq_sg, sz);
+
+	vq_sg->nsgl = nsgl;
+
+	for (i = (*_vq_sg)->nsgl; i < nsgl; ++i) {
+		vq_sg->sgl[i] = kmalloc_array(
+			SG_MAX_SINGLE_ALLOC, sizeof(struct scatterlist),
+			GFP_KERNEL
+		);
+		/*
+		 * handle failed allocation as soft error, we take whatever
+		 * we get
+		 */
+		if (!vq_sg->sgl[i]) {
+			ret = -ENOMEM;
+			vq_sg->nsgl = nsgl = i;
+			break;
+		}
+		sg_init_table(vq_sg->sgl[i], SG_MAX_SINGLE_ALLOC);
+		if (i) {
+			/* chain the lists */
+			sg_chain(vq_sg->sgl[i - 1], SG_MAX_SINGLE_ALLOC,
+				 vq_sg->sgl[i]);
+		}
+	}
+	sg_mark_end(&vq_sg->sgl[nsgl - 1][SG_MAX_SINGLE_ALLOC - 1]);
+
 	kfree(*_vq_sg);
 	*_vq_sg = vq_sg;
-	return 0;
+	return ret;
 }
 
 /**
@@ -860,16 +903,23 @@ p9_virtio_create(struct p9_client *client, const char *devname, char *args)
 			 * if resize fails, no big deal, then just continue with
 			 * whatever we got
 			 */
-			if (!vq_sg_resize(&chan->vq_sg, nsgl)) {
-				/*
-				 * decrement 2 pages as both 9p request and 9p reply have
-				 * to fit into the virtio round-trip message
-				 */
-				client->trans_maxsize =
-					PAGE_SIZE *
-					clamp_t(int,
-						(nsgl * SG_USER_PAGES_PER_LIST) - 2,
-						0, VIRTIO_MAX_DESCRIPTORS - 2);
+			vq_sg_resize(&chan->vq_sg, nsgl);
+			/*
+			 * actual allocation size might be less than requested, so use
+			 * vq_sg->nsgl instead of nsgl, and decrement 2 pages as both
+			 * 9p request and 9p reply have to fit into the virtio
+			 * round-trip message
+			 */
+			client->trans_maxsize =
+				PAGE_SIZE *
+				clamp_t(int,
+					(chan->vq_sg->nsgl * SG_USER_PAGES_PER_LIST) - 2,
+					0, VIRTIO_MAX_DESCRIPTORS - 2);
+			if (nsgl > chan->vq_sg->nsgl) {
+				pr_info("limiting 'msize' to %d as only %d "
+					"of %zu SG lists could be allocated",
+					client->trans_maxsize,
+					chan->vq_sg->nsgl, nsgl);
 			}
 		}
 #endif /* CONFIG_ARCH_NO_SG_CHAIN */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 07/11] net/9p: split message size argument into 't_size' and 'r_size' pair
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
                   ` (5 preceding siblings ...)
  2022-07-15 21:32 ` [PATCH v6 06/11] 9p/trans_virtio: resize sg lists to whatever is possible Christian Schoenebeck
@ 2022-07-15 21:32 ` Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 08/11] 9p: add P9_ERRMAX for 9p2000 and 9p2000.u Christian Schoenebeck
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:32 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

Refactor 'max_size' argument of p9_tag_alloc() and 'req_size' argument
of p9_client_prepare_req() both into a pair of arguments 't_size' and
'r_size' respectively to allow handling the buffer size for request and
reply separately from each other.

Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---
 net/9p/client.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index 20054addd81b..32a8f2f43479 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -255,24 +255,26 @@ static struct kmem_cache *p9_req_cache;
  * p9_tag_alloc - Allocate a new request.
  * @c: Client session.
  * @type: Transaction type.
- * @max_size: Maximum packet size for this request.
+ * @t_size: Buffer size for holding this request.
+ * @r_size: Buffer size for holding server's reply on this request.
  *
  * Context: Process context.
  * Return: Pointer to new request.
  */
 static struct p9_req_t *
-p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
+p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size)
 {
 	struct p9_req_t *req = kmem_cache_alloc(p9_req_cache, GFP_NOFS);
-	int alloc_msize = min(c->msize, max_size);
+	int alloc_tsize = min(c->msize, t_size);
+	int alloc_rsize = min(c->msize, r_size);
 	int tag;
 
 	if (!req)
 		return ERR_PTR(-ENOMEM);
 
-	if (p9_fcall_init(c, &req->tc, alloc_msize))
+	if (p9_fcall_init(c, &req->tc, alloc_tsize))
 		goto free_req;
-	if (p9_fcall_init(c, &req->rc, alloc_msize))
+	if (p9_fcall_init(c, &req->rc, alloc_rsize))
 		goto free;
 
 	p9pdu_reset(&req->tc);
@@ -678,7 +680,7 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
 }
 
 static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
-					      int8_t type, int req_size,
+					      int8_t type, uint t_size, uint r_size,
 					      const char *fmt, va_list ap)
 {
 	int err;
@@ -694,7 +696,7 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 	if (c->status == BeginDisconnect && type != P9_TCLUNK)
 		return ERR_PTR(-EIO);
 
-	req = p9_tag_alloc(c, type, req_size);
+	req = p9_tag_alloc(c, type, t_size, r_size);
 	if (IS_ERR(req))
 		return req;
 
@@ -731,7 +733,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	struct p9_req_t *req;
 
 	va_start(ap, fmt);
-	req = p9_client_prepare_req(c, type, c->msize, fmt, ap);
+	req = p9_client_prepare_req(c, type, c->msize, c->msize, fmt, ap);
 	va_end(ap);
 	if (IS_ERR(req))
 		return req;
@@ -829,7 +831,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 	/* We allocate a inline protocol data of only 4k bytes.
 	 * The actual content is passed in zero-copy fashion.
 	 */
-	req = p9_client_prepare_req(c, type, P9_ZC_HDR_SZ, fmt, ap);
+	req = p9_client_prepare_req(c, type, P9_ZC_HDR_SZ, P9_ZC_HDR_SZ, fmt, ap);
 	va_end(ap);
 	if (IS_ERR(req))
 		return req;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 08/11] 9p: add P9_ERRMAX for 9p2000 and 9p2000.u
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
                   ` (6 preceding siblings ...)
  2022-07-15 21:32 ` [PATCH v6 07/11] net/9p: split message size argument into 't_size' and 'r_size' pair Christian Schoenebeck
@ 2022-07-15 21:32 ` Christian Schoenebeck
  2022-07-15 21:32 ` [PATCH v6 09/11] net/9p: add p9_msg_buf_size() Christian Schoenebeck
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:32 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

Add P9_ERRMAX macro to 9P protocol header which reflects the maximum
error string length of Rerror replies for 9p2000 and 9p2000.u protocol
versions. Unfortunately a maximum error string length is not defined by
the 9p2000 spec, picking 128 as value for now, as this seems to be a
common max. size for POSIX error strings in practice.

9p2000.L protocol version uses Rlerror replies instead which does not
contain an error string.

Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---

This could probably be merged with the next patch, on doubt I posted it
separately as squashing is easy. The advantage of a separate patch is
making the discussion of the chosen value of max. 128 bytes more
prominent.

 include/net/9p/9p.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
index 24a509f559ee..13abe013af21 100644
--- a/include/net/9p/9p.h
+++ b/include/net/9p/9p.h
@@ -331,6 +331,9 @@ enum p9_qid_t {
 /* size of header for zero copy read/write */
 #define P9_ZC_HDR_SZ 4096

+/* maximum length of an error string */
+#define P9_ERRMAX 128
+
 /**
  * struct p9_qid - file system entity information
  * @type: 8-bit type &p9_qid_t
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 09/11] net/9p: add p9_msg_buf_size()
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
                   ` (7 preceding siblings ...)
  2022-07-15 21:32 ` [PATCH v6 08/11] 9p: add P9_ERRMAX for 9p2000 and 9p2000.u Christian Schoenebeck
@ 2022-07-15 21:32 ` Christian Schoenebeck
  2022-07-15 21:33 ` [PATCH v6 10/11] net/9p: add 'pooled_rbuffers' flag to struct p9_trans_module Christian Schoenebeck
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:32 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

This new function calculates a buffer size suitable for holding the
intended 9p request or response. For rather small message types (which
applies to almost all 9p message types actually) simply use hard coded
values. For some variable-length and potentially large message types
calculate a more precise value according to what data is actually
transmitted to avoid unnecessarily huge buffers.

So p9_msg_buf_size() divides the individual 9p message types into 3
message size categories:

  - dynamically calculated message size (i.e. potentially large)
  - 8k hard coded message size
  - 4k hard coded message size

As for the latter two hard coded message types: for most 9p message
types it is pretty obvious whether they would always fit into 4k or
8k. But for some of them it depends on the maximum directory entry
name length allowed by OS and filesystem for determining into which
of the two size categories they would fit into. Currently Linux
supports directory entry names up to NAME_MAX (255), however when
comparing the limitation of individual filesystems, ReiserFS
theoretically supports up to slightly below 4k long names. So in
order to make this code more future proof, and as revisiting it
later on is a bit tedious and has the potential to miss out details,
the decision [1] was made to take 4k as basis as for max. name length.

Link: https://lore.kernel.org/all/5564296.oo812IJUPE@silver/ [1]
Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---
 net/9p/protocol.c | 167 ++++++++++++++++++++++++++++++++++++++++++++++
 net/9p/protocol.h |   2 +
 2 files changed, 169 insertions(+)

diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index 3754c33e2974..1b7fea87fbe9 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -23,6 +23,173 @@
 
 #include <trace/events/9p.h>
 
+/* len[2] text[len] */
+#define P9_STRLEN(s) \
+	(2 + min_t(size_t, s ? strlen(s) : 0, USHRT_MAX))
+
+/**
+ * p9_msg_buf_size - Returns a buffer size sufficiently large to hold the
+ * intended 9p message.
+ * @c: client
+ * @type: message type
+ * @fmt: format template for assembling request message
+ * (see p9pdu_vwritef)
+ * @ap: variable arguments to be fed to passed format template
+ * (see p9pdu_vwritef)
+ *
+ * Note: Even for response types (P9_R*) the format template and variable
+ * arguments must always be for the originating request type (P9_T*).
+ */
+size_t p9_msg_buf_size(struct p9_client *c, enum p9_msg_t type,
+			const char *fmt, va_list ap)
+{
+	/* size[4] type[1] tag[2] */
+	const int hdr = 4 + 1 + 2;
+	/* ename[s] errno[4] */
+	const int rerror_size = hdr + P9_ERRMAX + 4;
+	/* ecode[4] */
+	const int rlerror_size = hdr + 4;
+	const int err_size =
+		c->proto_version == p9_proto_2000L ? rlerror_size : rerror_size;
+
+	static_assert(NAME_MAX <= 4*1024, "p9_msg_buf_size() currently assumes "
+				  "a max. allowed directory entry name length of 4k");
+
+	switch (type) {
+
+	/* message types not used at all */
+	case P9_TERROR:
+	case P9_TLERROR:
+	case P9_TAUTH:
+	case P9_RAUTH:
+		BUG();
+
+	/* variable length & potentially large message types */
+	case P9_TATTACH:
+		BUG_ON(strcmp("ddss?u", fmt));
+		va_arg(ap, int32_t);
+		va_arg(ap, int32_t);
+		{
+			const char *uname = va_arg(ap, const char *);
+			const char *aname = va_arg(ap, const char *);
+			/* fid[4] afid[4] uname[s] aname[s] n_uname[4] */
+			return hdr + 4 + 4 + P9_STRLEN(uname) + P9_STRLEN(aname) + 4;
+		}
+	case P9_TWALK:
+		BUG_ON(strcmp("ddT", fmt));
+		va_arg(ap, int32_t);
+		va_arg(ap, int32_t);
+		{
+			uint i, nwname = va_arg(ap, int);
+			size_t wname_all;
+			const char **wnames = va_arg(ap, const char **);
+			for (i = 0, wname_all = 0; i < nwname; ++i) {
+				wname_all += P9_STRLEN(wnames[i]);
+			}
+			/* fid[4] newfid[4] nwname[2] nwname*(wname[s]) */
+			return hdr + 4 + 4 + 2 + wname_all;
+		}
+	case P9_RWALK:
+		BUG_ON(strcmp("ddT", fmt));
+		va_arg(ap, int32_t);
+		va_arg(ap, int32_t);
+		{
+			uint nwname = va_arg(ap, int);
+			/* nwqid[2] nwqid*(wqid[13]) */
+			return max_t(size_t, hdr + 2 + nwname * 13, err_size);
+		}
+	case P9_TCREATE:
+		BUG_ON(strcmp("dsdb?s", fmt));
+		va_arg(ap, int32_t);
+		{
+			const char *name = va_arg(ap, const char *);
+			if (c->proto_version == p9_proto_legacy) {
+				/* fid[4] name[s] perm[4] mode[1] */
+				return hdr + 4 + P9_STRLEN(name) + 4 + 1;
+			} else {
+				va_arg(ap, int32_t);
+				va_arg(ap, int);
+				{
+					const char *ext = va_arg(ap, const char *);
+					/* fid[4] name[s] perm[4] mode[1] extension[s] */
+					return hdr + 4 + P9_STRLEN(name) + 4 + 1 + P9_STRLEN(ext);
+				}
+			}
+		}
+	case P9_TLCREATE:
+		BUG_ON(strcmp("dsddg", fmt));
+		va_arg(ap, int32_t);
+		{
+			const char *name = va_arg(ap, const char *);
+			/* fid[4] name[s] flags[4] mode[4] gid[4] */
+			return hdr + 4 + P9_STRLEN(name) + 4 + 4 + 4;
+		}
+	case P9_RREAD:
+	case P9_RREADDIR:
+		BUG_ON(strcmp("dqd", fmt));
+		va_arg(ap, int32_t);
+		va_arg(ap, int64_t);
+		{
+			const int32_t count = va_arg(ap, int32_t);
+			/* count[4] data[count] */
+			return max_t(size_t, hdr + 4 + count, err_size);
+		}
+	case P9_TWRITE:
+		BUG_ON(strcmp("dqV", fmt));
+		va_arg(ap, int32_t);
+		va_arg(ap, int64_t);
+		{
+			const int32_t count = va_arg(ap, int32_t);
+			/* fid[4] offset[8] count[4] data[count] */
+			return hdr + 4 + 8 + 4 + count;
+		}
+	case P9_TRENAMEAT:
+		BUG_ON(strcmp("dsds", fmt));
+		va_arg(ap, int32_t);
+		{
+			const char *oldname, *newname;
+			oldname = va_arg(ap, const char *);
+			va_arg(ap, int32_t);
+			newname = va_arg(ap, const char *);
+			/* olddirfid[4] oldname[s] newdirfid[4] newname[s] */
+			return hdr + 4 + P9_STRLEN(oldname) + 4 + P9_STRLEN(newname);
+		}
+	case P9_TSYMLINK:
+		BUG_ON(strcmp("dssg", fmt));
+		va_arg(ap, int32_t);
+		{
+			const char *name = va_arg(ap, const char *);
+			const char *symtgt = va_arg(ap, const char *);
+			/* fid[4] name[s] symtgt[s] gid[4] */
+			return hdr + 4 + P9_STRLEN(name) + P9_STRLEN(symtgt) + 4;
+		}
+
+	case P9_RERROR:
+		return rerror_size;
+	case P9_RLERROR:
+		return rlerror_size;
+
+	/* small message types */
+	case P9_TWSTAT:
+	case P9_RSTAT:
+	case P9_RREADLINK:
+	case P9_TXATTRWALK:
+	case P9_TXATTRCREATE:
+	case P9_TLINK:
+	case P9_TMKDIR:
+	case P9_TMKNOD:
+	case P9_TRENAME:
+	case P9_TUNLINKAT:
+	case P9_TLOCK:
+		return 8 * 1024;
+
+	/* tiny message types */
+	default:
+		return 4 * 1024;
+
+	}
+}
+
 static int
 p9pdu_writef(struct p9_fcall *pdu, int proto_version, const char *fmt, ...);
 
diff --git a/net/9p/protocol.h b/net/9p/protocol.h
index 6d719c30331a..ad2283d1f96b 100644
--- a/net/9p/protocol.h
+++ b/net/9p/protocol.h
@@ -8,6 +8,8 @@
  *  Copyright (C) 2008 by IBM, Corp.
  */
 
+size_t p9_msg_buf_size(struct p9_client *c, enum p9_msg_t type,
+			const char *fmt, va_list ap);
 int p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
 		  va_list ap);
 int p9pdu_readf(struct p9_fcall *pdu, int proto_version, const char *fmt, ...);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 10/11] net/9p: add 'pooled_rbuffers' flag to struct p9_trans_module
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
                   ` (8 preceding siblings ...)
  2022-07-15 21:32 ` [PATCH v6 09/11] net/9p: add p9_msg_buf_size() Christian Schoenebeck
@ 2022-07-15 21:33 ` Christian Schoenebeck
  2022-07-15 21:33 ` [PATCH v6 11/11] net/9p: allocate appropriate reduced message buffers Christian Schoenebeck
  2022-07-15 22:30 ` [PATCH v6 00/11] remove msize limit in virtio transport Dominique Martinet
  11 siblings, 0 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:33 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

This is a preparatory change for the subsequent patch: the RDMA
transport pulls the buffers for its 9p response messages from a
shared pool. [1] So this case has to be considered when choosing
an appropriate response message size in the subsequent patch.

Link: https://lore.kernel.org/all/Ys3jjg52EIyITPua@codewreck.org/ [1]
Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---
 include/net/9p/transport.h | 5 +++++
 net/9p/trans_fd.c          | 1 +
 net/9p/trans_rdma.c        | 1 +
 net/9p/trans_virtio.c      | 1 +
 net/9p/trans_xen.c         | 1 +
 5 files changed, 9 insertions(+)

diff --git a/include/net/9p/transport.h b/include/net/9p/transport.h
index ff842f963071..766ec07c9599 100644
--- a/include/net/9p/transport.h
+++ b/include/net/9p/transport.h
@@ -19,6 +19,10 @@
  * @list: used to maintain a list of currently available transports
  * @name: the human-readable name of the transport
  * @maxsize: transport provided maximum packet size
+ * @pooled_rbuffers: currently only set for RDMA transport which pulls the
+ *                   response buffers from a shared pool, and accordingly
+ *                   we're less flexible when choosing the response message
+ *                   size in this case
  * @def: set if this transport should be considered the default
  * @create: member function to create a new connection on this transport
  * @close: member function to discard a connection on this transport
@@ -38,6 +42,7 @@ struct p9_trans_module {
 	struct list_head list;
 	char *name;		/* name of transport */
 	int maxsize;		/* max message size of transport */
+	bool pooled_rbuffers;
 	int def;		/* this transport should be default */
 	struct module *owner;
 	int (*create)(struct p9_client *client,
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 8f8f95e39b03..eecbb5332bea 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -1081,6 +1081,7 @@ p9_fd_create(struct p9_client *client, const char *addr, char *args)
 static struct p9_trans_module p9_tcp_trans = {
 	.name = "tcp",
 	.maxsize = MAX_SOCK_BUF,
+	.pooled_rbuffers = false,
 	.def = 0,
 	.create = p9_fd_create_tcp,
 	.close = p9_fd_close,
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 88e563826674..24f287baee70 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -739,6 +739,7 @@ rdma_create_trans(struct p9_client *client, const char *addr, char *args)
 static struct p9_trans_module p9_rdma_trans = {
 	.name = "rdma",
 	.maxsize = P9_RDMA_MAXSIZE,
+	.pooled_rbuffers = true,
 	.def = 0,
 	.owner = THIS_MODULE,
 	.create = rdma_create_trans,
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 52d00cb3c105..d47b28b3f02a 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -1011,6 +1011,7 @@ static struct p9_trans_module p9_virtio_trans = {
 	 */
 	.maxsize = PAGE_SIZE *
 		((VIRTQUEUE_SG_NSGL_DEFAULT * SG_USER_PAGES_PER_LIST) - 3),
+	.pooled_rbuffers = false,
 	.def = 1,
 	.owner = THIS_MODULE,
 };
diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index 833cd3792c51..3434a080abfa 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -246,6 +246,7 @@ static irqreturn_t xen_9pfs_front_event_handler(int irq, void *r)
 static struct p9_trans_module p9_xen_trans = {
 	.name = "xen",
 	.maxsize = 1 << (XEN_9PFS_RING_ORDER + XEN_PAGE_SHIFT - 2),
+	.pooled_rbuffers = false,
 	.def = 1,
 	.create = p9_xen_create,
 	.close = p9_xen_close,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 11/11] net/9p: allocate appropriate reduced message buffers
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
                   ` (9 preceding siblings ...)
  2022-07-15 21:33 ` [PATCH v6 10/11] net/9p: add 'pooled_rbuffers' flag to struct p9_trans_module Christian Schoenebeck
@ 2022-07-15 21:33 ` Christian Schoenebeck
  2022-10-17 17:03   ` Jason Gunthorpe
  2022-07-15 22:30 ` [PATCH v6 00/11] remove msize limit in virtio transport Dominique Martinet
  11 siblings, 1 reply; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-15 21:33 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-kernel, netdev, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

So far 'msize' was simply used for all 9p message types, which is far
too much and slowed down performance tremendously with large values
for user configurable 'msize' option.

Let's stop this waste by using the new p9_msg_buf_size() function for
allocating more appropriate, smaller buffers according to what is
actually sent over the wire.

Only exception: RDMA transport is currently excluded from this message
size optimization - for its response buffers that is - as RDMA transport
would not cope with it, due to its response buffers being pulled from a
shared pool. [1]

Link: https://lore.kernel.org/all/Ys3jjg52EIyITPua@codewreck.org/ [1]
Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---
 net/9p/client.c | 42 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index 32a8f2f43479..f068f4b656b5 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -255,19 +255,35 @@ static struct kmem_cache *p9_req_cache;
  * p9_tag_alloc - Allocate a new request.
  * @c: Client session.
  * @type: Transaction type.
- * @t_size: Buffer size for holding this request.
- * @r_size: Buffer size for holding server's reply on this request.
+ * @t_size: Buffer size for holding this request
+ * (automatic calculation by format template if 0).
+ * @r_size: Buffer size for holding server's reply on this request
+ * (automatic calculation by format template if 0).
+ * @fmt: Format template for assembling 9p request message
+ * (see p9pdu_vwritef).
+ * @ap: Variable arguments to be fed to passed format template
+ * (see p9pdu_vwritef).
  *
  * Context: Process context.
  * Return: Pointer to new request.
  */
 static struct p9_req_t *
-p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size)
+p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size,
+	      const char *fmt, va_list ap)
 {
 	struct p9_req_t *req = kmem_cache_alloc(p9_req_cache, GFP_NOFS);
-	int alloc_tsize = min(c->msize, t_size);
-	int alloc_rsize = min(c->msize, r_size);
+	int alloc_tsize;
+	int alloc_rsize;
 	int tag;
+	va_list apc;
+
+	va_copy(apc, ap);
+	alloc_tsize = min_t(size_t, c->msize,
+			    t_size ?: p9_msg_buf_size(c, type, fmt, apc));
+	va_end(apc);
+
+	alloc_rsize = min_t(size_t, c->msize,
+			    r_size ?: p9_msg_buf_size(c, type + 1, fmt, ap));
 
 	if (!req)
 		return ERR_PTR(-ENOMEM);
@@ -685,6 +701,7 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 {
 	int err;
 	struct p9_req_t *req;
+	va_list apc;
 
 	p9_debug(P9_DEBUG_MUX, "client %p op %d\n", c, type);
 
@@ -696,7 +713,9 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 	if (c->status == BeginDisconnect && type != P9_TCLUNK)
 		return ERR_PTR(-EIO);
 
-	req = p9_tag_alloc(c, type, t_size, r_size);
+	va_copy(apc, ap);
+	req = p9_tag_alloc(c, type, t_size, r_size, fmt, apc);
+	va_end(apc);
 	if (IS_ERR(req))
 		return req;
 
@@ -731,9 +750,18 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	int sigpending, err;
 	unsigned long flags;
 	struct p9_req_t *req;
+	/* Passing zero for tsize/rsize to p9_client_prepare_req() tells it to
+	 * auto determine an appropriate (small) request/response size
+	 * according to actual message data being sent. Currently RDMA
+	 * transport is excluded from this response message size optimization,
+	 * as it would not cope with it, due to its pooled response buffers
+	 * (using an optimized request size for RDMA as well though).
+	 */
+	const uint tsize = 0;
+	const uint rsize = c->trans_mod->pooled_rbuffers ? c->msize : 0;
 
 	va_start(ap, fmt);
-	req = p9_client_prepare_req(c, type, c->msize, c->msize, fmt, ap);
+	req = p9_client_prepare_req(c, type, tsize, rsize, fmt, ap);
 	va_end(ap);
 	if (IS_ERR(req))
 		return req;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 11/11] net/9p: allocate appropriate reduced message buffers
  2022-07-15 21:33 ` [PATCH v6 11/11] net/9p: allocate appropriate reduced message buffers Christian Schoenebeck
@ 2022-10-17 17:03   ` Jason Gunthorpe
  2022-10-17 18:03     ` Christian Schoenebeck
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2022-10-17 17:03 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: v9fs-developer, linux-kernel, netdev, Dominique Martinet,
	Eric Van Hensbergen, Latchesar Ionkov, Nikolay Kichukov,
	Leon Romanovsky

On Fri, Jul 15, 2022 at 11:33:56PM +0200, Christian Schoenebeck wrote:
> So far 'msize' was simply used for all 9p message types, which is far
> too much and slowed down performance tremendously with large values
> for user configurable 'msize' option.
> 
> Let's stop this waste by using the new p9_msg_buf_size() function for
> allocating more appropriate, smaller buffers according to what is
> actually sent over the wire.
> 
> Only exception: RDMA transport is currently excluded from this message
> size optimization - for its response buffers that is - as RDMA transport
> would not cope with it, due to its response buffers being pulled from a
> shared pool. [1]
> 
> Link: https://lore.kernel.org/all/Ys3jjg52EIyITPua@codewreck.org/ [1]
> Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
> ---
>  net/9p/client.c | 42 +++++++++++++++++++++++++++++++++++-------
>  1 file changed, 35 insertions(+), 7 deletions(-)

It took me a while to sort out, but for any others - this patch is
incompatible with qemu 5.0. It starts working again after this qemu
patch:

commit cf45183b718f02b1369e18c795dc51bc1821245d
Author: Stefano Stabellini <stefano.stabellini@xilinx.com>
Date:   Thu May 21 12:26:25 2020 -0700

    Revert "9p: init_in_iov_from_pdu can truncate the size"
    
    This reverts commit 16724a173049ac29c7b5ade741da93a0f46edff7.
    It causes https://bugs.launchpad.net/bugs/1877688.
    
    Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
    Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
    Message-Id: <20200521192627.15259-1-sstabellini@kernel.org>
    Signed-off-by: Greg Kurz <groug@kaod.org>

It causes something like this:

# modprobe ib_cm
qemu-system-x86_64: VirtFS reply type 117 needs 17 bytes, buffer has 17, less than minimum

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 11/11] net/9p: allocate appropriate reduced message buffers
  2022-10-17 17:03   ` Jason Gunthorpe
@ 2022-10-17 18:03     ` Christian Schoenebeck
  0 siblings, 0 replies; 20+ messages in thread
From: Christian Schoenebeck @ 2022-10-17 18:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: v9fs-developer, linux-kernel, netdev, Dominique Martinet,
	Eric Van Hensbergen, Latchesar Ionkov, Nikolay Kichukov,
	Leon Romanovsky, Greg Kurz, Stefano Stabellini

On Monday, October 17, 2022 7:03:11 PM CEST Jason Gunthorpe wrote:
> On Fri, Jul 15, 2022 at 11:33:56PM +0200, Christian Schoenebeck wrote:
> > So far 'msize' was simply used for all 9p message types, which is far
> > too much and slowed down performance tremendously with large values
> > for user configurable 'msize' option.
> > 
> > Let's stop this waste by using the new p9_msg_buf_size() function for
> > allocating more appropriate, smaller buffers according to what is
> > actually sent over the wire.
> > 
> > Only exception: RDMA transport is currently excluded from this message
> > size optimization - for its response buffers that is - as RDMA transport
> > would not cope with it, due to its response buffers being pulled from a
> > shared pool. [1]
> > 
> > Link: https://lore.kernel.org/all/Ys3jjg52EIyITPua@codewreck.org/ [1]
> > Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
> > ---
> > 
> >  net/9p/client.c | 42 +++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 35 insertions(+), 7 deletions(-)
> 
> It took me a while to sort out, but for any others - this patch is
> incompatible with qemu 5.0. It starts working again after this qemu
> patch:
> 
> commit cf45183b718f02b1369e18c795dc51bc1821245d
> Author: Stefano Stabellini <stefano.stabellini@xilinx.com>
> Date:   Thu May 21 12:26:25 2020 -0700
> 
>     Revert "9p: init_in_iov_from_pdu can truncate the size"
> 
>     This reverts commit 16724a173049ac29c7b5ade741da93a0f46edff7.
>     It causes https://bugs.launchpad.net/bugs/1877688.
> 
>     Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
>     Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
>     Message-Id: <20200521192627.15259-1-sstabellini@kernel.org>
>     Signed-off-by: Greg Kurz <groug@kaod.org>
> 
> It causes something like this:
> 
> # modprobe ib_cm
> qemu-system-x86_64: VirtFS reply type 117 needs 17 bytes, buffer has 17,
> less than minimum

9p server in QEMU 5.0 was broken by mentioned, reverted QEMU patch, and it was 
already fixed in stable release 5.0.1.

It is not that recent kernel patch is breaking behaviour, but it triggers that 
(short-lived) QEMU bug more reliably, as 9p client is now using smaller 
messages more often. But even without this kernel patch, you would still get a 
QEMU hang with short I/O. So it is not a good idea to continue using that 
particular, old QEMU version, please update at least to QEMU 5.0.1.

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 00/11] remove msize limit in virtio transport
  2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
                   ` (10 preceding siblings ...)
  2022-07-15 21:33 ` [PATCH v6 11/11] net/9p: allocate appropriate reduced message buffers Christian Schoenebeck
@ 2022-07-15 22:30 ` Dominique Martinet
  2022-07-15 23:28   ` Dominique Martinet
  11 siblings, 1 reply; 20+ messages in thread
From: Dominique Martinet @ 2022-07-15 22:30 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: v9fs-developer, linux-kernel, netdev, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

Christian Schoenebeck wrote on Fri, Jul 15, 2022 at 11:35:26PM +0200:
> * Patches 7..11 tremendously reduce unnecessarily huge 9p message sizes and
>   therefore provide performance gain as well. So far, almost all 9p messages
>   simply allocated message buffers exactly msize large, even for messages
>   that actually just needed few bytes. So these patches make sense by
>   themselves, independent of this overall series, however for this series
>   even more, because the larger msize, the more this issue would have hurt
>   otherwise.

Unless they got stuck somewhere the mails are missing patches 10 and 11,
one too many 0s to git send-email ?

I'll do a quick review from github commit meanwhile
-- 
Dominique

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 00/11] remove msize limit in virtio transport
  2022-07-15 22:30 ` [PATCH v6 00/11] remove msize limit in virtio transport Dominique Martinet
@ 2022-07-15 23:28   ` Dominique Martinet
  2022-07-16  9:54     ` Christian Schoenebeck
  0 siblings, 1 reply; 20+ messages in thread
From: Dominique Martinet @ 2022-07-15 23:28 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: v9fs-developer, linux-kernel, netdev, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

Dominique Martinet wrote on Sat, Jul 16, 2022 at 07:30:45AM +0900:
> Christian Schoenebeck wrote on Fri, Jul 15, 2022 at 11:35:26PM +0200:
> > * Patches 7..11 tremendously reduce unnecessarily huge 9p message sizes and
> >   therefore provide performance gain as well. So far, almost all 9p messages
> >   simply allocated message buffers exactly msize large, even for messages
> >   that actually just needed few bytes. So these patches make sense by
> >   themselves, independent of this overall series, however for this series
> >   even more, because the larger msize, the more this issue would have hurt
> >   otherwise.
> 
> Unless they got stuck somewhere the mails are missing patches 10 and 11,
> one too many 0s to git send-email ?

nevermind, they just got in after 1h30... I thought it'd been 1h since
the first mails because the first ones were already 50 mins late and I
hadn't noticed! I wonder where they're stuck, that's the time
lizzy.crudebyte.com received them and it filters earlier headers so
probably between you and it?

ohwell.

> I'll do a quick review from github commit meanwhile

Looks good to me, I'll try to get some tcp/rdma testing done this
weekend and stash them up to next

--
Dominique

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 00/11] remove msize limit in virtio transport
  2022-07-15 23:28   ` Dominique Martinet
@ 2022-07-16  9:54     ` Christian Schoenebeck
  2022-07-16 11:54       ` Dominique Martinet
  0 siblings, 1 reply; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-16  9:54 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: v9fs-developer, linux-kernel, netdev, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

On Samstag, 16. Juli 2022 01:28:51 CEST Dominique Martinet wrote:
> Dominique Martinet wrote on Sat, Jul 16, 2022 at 07:30:45AM +0900:
> > Christian Schoenebeck wrote on Fri, Jul 15, 2022 at 11:35:26PM +0200:
> > > * Patches 7..11 tremendously reduce unnecessarily huge 9p message sizes
> > > and
> > > 
> > >   therefore provide performance gain as well. So far, almost all 9p
> > >   messages
> > >   simply allocated message buffers exactly msize large, even for
> > >   messages
> > >   that actually just needed few bytes. So these patches make sense by
> > >   themselves, independent of this overall series, however for this
> > >   series
> > >   even more, because the larger msize, the more this issue would have
> > >   hurt
> > >   otherwise.
> > 
> > Unless they got stuck somewhere the mails are missing patches 10 and 11,
> > one too many 0s to git send-email ?
> 
> nevermind, they just got in after 1h30... I thought it'd been 1h since
> the first mails because the first ones were already 50 mins late and I
> hadn't noticed! I wonder where they're stuck, that's the time
> lizzy.crudebyte.com received them and it filters earlier headers so
> probably between you and it?

Certainly an outbound SMTP greylisting delay, i.e. lack of karma. Sometimes my 
patches make it to lists after 3 hours. I haven't figured out though why some 
patches within the same series arrive significantly faster than certain other 
ones, which is especially weird when that happens not in order they were sent.

> ohwell.
> 
> > I'll do a quick review from github commit meanwhile
> 
> Looks good to me, I'll try to get some tcp/rdma testing done this
> weekend and stash them up to next

Great, thanks!

> --
> Dominique





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 00/11] remove msize limit in virtio transport
  2022-07-16  9:54     ` Christian Schoenebeck
@ 2022-07-16 11:54       ` Dominique Martinet
  2022-07-16 12:10         ` Christian Schoenebeck
  0 siblings, 1 reply; 20+ messages in thread
From: Dominique Martinet @ 2022-07-16 11:54 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: v9fs-developer, linux-kernel, netdev, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

Christian Schoenebeck wrote on Sat, Jul 16, 2022 at 11:54:29AM +0200:
> > Looks good to me, I'll try to get some tcp/rdma testing done this
> > weekend and stash them up to next
> 
> Great, thanks!

Quick update on this: tcp seems to work fine, I need to let it run a bit
longer but not expecting any trouble.

RDMA is... complicated.
I was certain an adapter in loopback mode ought to work so I just
bought a cheap card alone, but I couldn't get it to work (ipoib works
but I think that's just the linux tcp stack cheating, I'm getting unable
to resolve route (rdma_resolve_route) errors when trying real rdma
applications...)

OTOH, linux got softiwarp merged in as RDMA_SIW which works perfectly
with my rdma applications, after fixing/working around a couple of bugs
on the server I'm getting hangs that I can't reproduce with debug on
current master so this isn't exactly great, not sure where it goes
wrong :|
At least with debug still enabled I'm not getting any new hang with your
patches, so let's call it ok...?

I'll send a mail to ex-collegues who might care about it (and
investigate a bit more if so), and a more open mail if that falls
short...

--
Dominique

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 00/11] remove msize limit in virtio transport
  2022-07-16 11:54       ` Dominique Martinet
@ 2022-07-16 12:10         ` Christian Schoenebeck
  2022-07-16 12:44           ` Dominique Martinet
  0 siblings, 1 reply; 20+ messages in thread
From: Christian Schoenebeck @ 2022-07-16 12:10 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: v9fs-developer, linux-kernel, netdev, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

On Samstag, 16. Juli 2022 13:54:08 CEST Dominique Martinet wrote:
> Christian Schoenebeck wrote on Sat, Jul 16, 2022 at 11:54:29AM +0200:
> > > Looks good to me, I'll try to get some tcp/rdma testing done this
> > > weekend and stash them up to next
> > 
> > Great, thanks!
> 
> Quick update on this: tcp seems to work fine, I need to let it run a bit
> longer but not expecting any trouble.
> 
> RDMA is... complicated.
> I was certain an adapter in loopback mode ought to work so I just
> bought a cheap card alone, but I couldn't get it to work (ipoib works
> but I think that's just the linux tcp stack cheating, I'm getting unable
> to resolve route (rdma_resolve_route) errors when trying real rdma
> applications...)
> 
> 
> OTOH, linux got softiwarp merged in as RDMA_SIW which works perfectly
> with my rdma applications, after fixing/working around a couple of bugs
> on the server I'm getting hangs that I can't reproduce with debug on
> current master so this isn't exactly great, not sure where it goes
> wrong :|
> At least with debug still enabled I'm not getting any new hang with your
> patches, so let's call it ok...?

Well, I would need more info to judge or resolve that, like which patch 
exactly broke RDMA behaviour for you?

> I'll send a mail to ex-collegues who might care about it (and
> investigate a bit more if so), and a more open mail if that falls
> short...
> 
> --
> Dominique





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 00/11] remove msize limit in virtio transport
  2022-07-16 12:10         ` Christian Schoenebeck
@ 2022-07-16 12:44           ` Dominique Martinet
  0 siblings, 0 replies; 20+ messages in thread
From: Dominique Martinet @ 2022-07-16 12:44 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: v9fs-developer, linux-kernel, netdev, Eric Van Hensbergen,
	Latchesar Ionkov, Nikolay Kichukov

Christian Schoenebeck wrote on Sat, Jul 16, 2022 at 02:10:05PM +0200:
> > OTOH, linux got softiwarp merged in as RDMA_SIW which works perfectly
> > with my rdma applications, after fixing/working around a couple of bugs
> > on the server I'm getting hangs that I can't reproduce with debug on
> > current master so this isn't exactly great, not sure where it goes
> > wrong :|
> > At least with debug still enabled I'm not getting any new hang with your
> > patches, so let's call it ok...?
> 
> Well, I would need more info to judge or resolve that, like which patch 
> exactly broke RDMA behaviour for you?

I wouldn't have troubles if I knew that, I don't have access to the
hardware I last used 9p/rdma on so it might very well be a softiwarp
compatibility problem, server version, or anything else.

At the very least I'm not getting new errors and the server does receive
everyhing we sent, so as far as these patches are concerned I don't
think we're making anything worse.

I'll get back to you once I hear back from former employer (if they can
have someone run some tests, confirm it works and/or bisect that), I
really spent too much time trying to get the old adapter I got working
already...

All I can say is that there's no error anywhere, I've finally reproduced
it once with debug and I can confirm the server sent the reply and
didn't get any error in ibv_post_send() so the message should have been
sent, but the client just never processed it.
Next step would be to add/enable some logs on the client see if it
actually received something or not and go from there, but I'd like to
see something that works first...

--
Dominique

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-10-17 18:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-15 21:35 [PATCH v6 00/11] remove msize limit in virtio transport Christian Schoenebeck
2022-07-15 21:32 ` [PATCH v6 01/11] 9p/trans_virtio: separate allocation of scatter gather list Christian Schoenebeck
2022-07-15 21:32 ` [PATCH v6 02/11] 9p/trans_virtio: turn amount of sg lists into runtime info Christian Schoenebeck
2022-07-15 21:32 ` [PATCH v6 03/11] 9p/trans_virtio: introduce struct virtqueue_sg Christian Schoenebeck
2022-07-15 21:32 ` [PATCH v6 04/11] net/9p: add trans_maxsize to struct p9_client Christian Schoenebeck
2022-07-15 21:32 ` [PATCH v6 05/11] 9p/trans_virtio: support larger msize values Christian Schoenebeck
2022-07-15 21:32 ` [PATCH v6 06/11] 9p/trans_virtio: resize sg lists to whatever is possible Christian Schoenebeck
2022-07-15 21:32 ` [PATCH v6 07/11] net/9p: split message size argument into 't_size' and 'r_size' pair Christian Schoenebeck
2022-07-15 21:32 ` [PATCH v6 08/11] 9p: add P9_ERRMAX for 9p2000 and 9p2000.u Christian Schoenebeck
2022-07-15 21:32 ` [PATCH v6 09/11] net/9p: add p9_msg_buf_size() Christian Schoenebeck
2022-07-15 21:33 ` [PATCH v6 10/11] net/9p: add 'pooled_rbuffers' flag to struct p9_trans_module Christian Schoenebeck
2022-07-15 21:33 ` [PATCH v6 11/11] net/9p: allocate appropriate reduced message buffers Christian Schoenebeck
2022-10-17 17:03   ` Jason Gunthorpe
2022-10-17 18:03     ` Christian Schoenebeck
2022-07-15 22:30 ` [PATCH v6 00/11] remove msize limit in virtio transport Dominique Martinet
2022-07-15 23:28   ` Dominique Martinet
2022-07-16  9:54     ` Christian Schoenebeck
2022-07-16 11:54       ` Dominique Martinet
2022-07-16 12:10         ` Christian Schoenebeck
2022-07-16 12:44           ` Dominique Martinet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).