linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC-V3] [net/9p] [PATCH 0/7] Zero Copy
@ 2011-02-11  1:25 Venkateswararao Jujjuri (JV)
  2011-02-11  1:25 ` [RFC-V3 1/7] [net/9p] Additional elements to p9_fcall to acoomodate zero copy Venkateswararao Jujjuri (JV)
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11  1:25 UTC (permalink / raw)
  To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri

Changes from V2
---------------
o Format and variable corrections/changes
o Logic to determine nr_pages
o Added logic to avoid additional malloc to place mapped sglist.
o Don't attempt zero copy for small IO (<1024)
o Rearranged code.

In this patch series I am trying to take another stab at zero copy.
Please review and provide your feedback.

Goal:

9P Linux client makes an additional copy of read/write buffer into the kernel
buffer.  There are some transports(especially in the virtualization
environment) which can avoid this additional copy by directly sending user
buffer to the server.

Design Goals.

- Have minimal changes to the net layer so that common code is not polluted by
 the transport specifics.
- Create a common transport library which can be used by other transports.
- Avoid additional optimizations in the initial attempt (more details below)
 and focus on achieving basic functionality.

Design

Send the payload buffers separately to the transport layer if it asks for it.
Transport layer specifies the preference through newly introduced field in the
transport module.  (clnt->trans_mod->pref)
This method has few advantages.
  - Keeps the net layer clean and lets the transport layer deal with specifics.
  - mapping user addr into kernel pages pins the memory. Lack of flow control
    make the system vulnerable to denial-of-service attacks. This change gives
    transport layer more control to implement effective flow control.
 - If a transport layer doesn't see the need to handle payload separately,
   it can set the preference accordingly so that current code works with no
   changes. This is very useful for transports which has no plans of
   converting/pinning user pages. Especially things become more complex as
   copy_to_user()  is not possible as reads(RREAD) are handled by the
   transport layer in the interrupt context.

TREAD/RERROR scenario.
This is a rather sticky issue to deal with for the !dotl protocol. This is not
a problem in 9P2000.L as the error is a known size (errno) but in other
protocols it is a string of size (ERRMAX).  To take care of TREAD/RERROR
scenario in !dotl we make sure that the read buffer is big enough to
accommodate  ERRMAX string. If the read size is small, don't send the payload
buffer separately to the transport layer  even if it set its preferences other
way (P9_TRANS_PREF_PAYLOAD_SEP).

For bigger reads, RERROR is handled by copying back user buffers into kernel
buffer in the case of error. As this is done only in the error path it should
not affect the regular performance.

Created trans_common.[ch] to house common functions so that other transport
layers can take advantage of them.

msize: One of the major advantage of this patch series is to have bigger msize
to pull off bigger read/writes from the server. Increasing the msize is not
really a solution as majority of other transactions are extremely small which
could result in waste of kernel heap.  To address this problem we need to have
two sizes of PDUs.
Given that this is an additional optimization/usecase of zero copy..and not a
NEED to implement zerocopy itself, I am differing it to next round of changes.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC-V3 1/7] [net/9p] Additional elements to p9_fcall to acoomodate zero copy.
  2011-02-11  1:25 [RFC-V3] [net/9p] [PATCH 0/7] Zero Copy Venkateswararao Jujjuri (JV)
@ 2011-02-11  1:25 ` Venkateswararao Jujjuri (JV)
  2011-02-11  1:25 ` [RFC-V3 2/7] [net/9p] Adds supporting functions for " Venkateswararao Jujjuri (JV)
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11  1:25 UTC (permalink / raw)
  To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
 include/net/9p/9p.h |    8 ++++++++
 net/9p/protocol.c   |    4 ++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
index 071fd7a..7aefa6d 100644
--- a/include/net/9p/9p.h
+++ b/include/net/9p/9p.h
@@ -689,6 +689,10 @@ struct p9_rwstat {
  * @tag: transaction id of the request
  * @offset: used by marshalling routines to track currentposition in buffer
  * @capacity: used by marshalling routines to track total capacity
+ * @pubuf: Payload user buffer given by the caller
+ * @pubuf: Payload kernel buffer given by the caller
+ * @pbuf_size: pubuf/pkbuf(only one will be !NULL) size to be read/write.
+ * @private: For transport layer's use.
  * @sdata: payload
  *
  * &p9_fcall represents the structure for all 9P RPC
@@ -705,6 +709,10 @@ struct p9_fcall {
 
 	size_t offset;
 	size_t capacity;
+	char __user *pubuf;
+	char *pkbuf;
+	size_t pbuf_size;
+	void *private;
 
 	uint8_t *sdata;
 };
diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index 1e308f2..d888847 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -606,6 +606,10 @@ void p9pdu_reset(struct p9_fcall *pdu)
 {
 	pdu->offset = 0;
 	pdu->size = 0;
+	pdu->private = NULL;
+	pdu->pubuf = NULL;
+	pdu->pkbuf = NULL;
+	pdu->pbuf_size = 0;
 }
 
 int p9dirent_read(char *buf, int len, struct p9_dirent *dirent,
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC-V3 2/7] [net/9p] Adds supporting functions for zero copy.
  2011-02-11  1:25 [RFC-V3] [net/9p] [PATCH 0/7] Zero Copy Venkateswararao Jujjuri (JV)
  2011-02-11  1:25 ` [RFC-V3 1/7] [net/9p] Additional elements to p9_fcall to acoomodate zero copy Venkateswararao Jujjuri (JV)
@ 2011-02-11  1:25 ` Venkateswararao Jujjuri (JV)
  2011-02-11  1:25 ` [RFC-V3 3/7] [net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed Venkateswararao Jujjuri (JV)
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11  1:25 UTC (permalink / raw)
  To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
 net/9p/Makefile       |    1 +
 net/9p/trans_common.c |   93 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/9p/trans_common.h |   29 +++++++++++++++
 3 files changed, 123 insertions(+), 0 deletions(-)
 create mode 100644 net/9p/trans_common.c
 create mode 100644 net/9p/trans_common.h

diff --git a/net/9p/Makefile b/net/9p/Makefile
index 198a640..a0874cc 100644
--- a/net/9p/Makefile
+++ b/net/9p/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_NET_9P_RDMA) += 9pnet_rdma.o
 	util.o \
 	protocol.o \
 	trans_fd.o \
+	trans_common.o \
 
 9pnet_virtio-objs := \
 	trans_virtio.o \
diff --git a/net/9p/trans_common.c b/net/9p/trans_common.c
new file mode 100644
index 0000000..ca705f1
--- /dev/null
+++ b/net/9p/trans_common.c
@@ -0,0 +1,93 @@
+/*
+ * Copyright IBM Corporation, 2010
+ * Author Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#include <linux/slab.h>
+#include <net/9p/9p.h>
+#include <net/9p/client.h>
+#include <linux/scatterlist.h>
+#include "trans_common.h"
+
+/**
+ *  p9_release_req_pages - Release pages after the transaction.
+ *  @*private: PDU's private page of struct trans_rpage_info
+ */
+void
+p9_release_req_pages(struct trans_rpage_info *rpinfo)
+{
+	int i = 0;
+
+	while (rpinfo->rp_data[i] && rpinfo->rp_nr_pages--) {
+		put_page(rpinfo->rp_data[i]);
+		i++;
+	}
+}
+
+/**
+ * p9_nr_pages - Return number of pages needed to accomodate the payload.
+ */
+int
+p9_nr_pages(struct p9_req_t *req)
+{
+	int start_page, end_page;
+	start_page =  (unsigned long long)req->tc->pubuf >> PAGE_SHIFT;
+	end_page = ((unsigned long long)req->tc->pubuf + req->tc->pbuf_size +
+			PAGE_SIZE - 1) >> PAGE_SHIFT;
+	return end_page - start_page;
+}
+
+/**
+ * payload_gup - Translates user buffer into kernel pages and
+ * pins them either for read/write through get_user_pages_fast().
+ * @req: Request to be sent to server.
+ * @pdata_off: data offset into the first page after translation (gup).
+ * @pdata_len: Total length of the IO. gup may not return requested # of pages.
+ * @nr_pages: number of pages to accomodate the payload
+ * @rw: Indicates if the pages are for read or write.
+ */
+int
+p9_payload_gup(struct p9_req_t *req, size_t *pdata_off, int *pdata_len,
+		int nr_pages, u8 rw)
+{
+	uint32_t first_page_bytes = 0;
+	uint32_t pdata_mapped_pages;
+	struct trans_rpage_info  *rpinfo;
+
+	*pdata_off = (size_t)req->tc->pubuf & (PAGE_SIZE-1);
+
+	if (*pdata_off)
+		first_page_bytes = min((PAGE_SIZE - *pdata_off),
+				req->tc->pbuf_size);
+
+	rpinfo = req->tc->private;
+	pdata_mapped_pages = get_user_pages_fast((unsigned long)req->tc->pubuf,
+			nr_pages, rw, &rpinfo->rp_data[0]);
+
+	if (pdata_mapped_pages < 0) {
+		printk(KERN_ERR "get_user_pages_fast failed:%d udata:%p"
+				"nr_pages:%d\n", pdata_mapped_pages,
+				req->tc->pubuf, nr_pages);
+		pdata_mapped_pages = 0;
+		return -EIO;
+	}
+	rpinfo->rp_nr_pages = pdata_mapped_pages;
+	if (*pdata_off) {
+		*pdata_len = first_page_bytes;
+		*pdata_len += min((req->tc->pbuf_size - *pdata_len),
+				((size_t)pdata_mapped_pages - 1) << PAGE_SHIFT);
+	} else {
+		*pdata_len = min(req->tc->pbuf_size,
+				(size_t)pdata_mapped_pages << PAGE_SHIFT);
+	}
+	return 0;
+}
diff --git a/net/9p/trans_common.h b/net/9p/trans_common.h
new file mode 100644
index 0000000..04977e0
--- /dev/null
+++ b/net/9p/trans_common.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright IBM Corporation, 2010
+ * Author Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+/**
+ * struct trans_rpage_info - To store mapped page information in PDU.
+ * @rp_alloc:Set if this structure is allocd, not a reuse unused space in pdu.
+ * @rp_nr_pages: Number of mapped pages
+ * @rp_data: Array of page pointers
+ */
+struct trans_rpage_info {
+	u8 rp_alloc;
+	int rp_nr_pages;
+	struct page *rp_data[0];
+};
+
+void p9_release_req_pages(struct trans_rpage_info *);
+int p9_payload_gup(struct p9_req_t *, size_t *, int *, int, u8);
+int p9_nr_pages(struct p9_req_t *);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC-V3 3/7] [net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed.
  2011-02-11  1:25 [RFC-V3] [net/9p] [PATCH 0/7] Zero Copy Venkateswararao Jujjuri (JV)
  2011-02-11  1:25 ` [RFC-V3 1/7] [net/9p] Additional elements to p9_fcall to acoomodate zero copy Venkateswararao Jujjuri (JV)
  2011-02-11  1:25 ` [RFC-V3 2/7] [net/9p] Adds supporting functions for " Venkateswararao Jujjuri (JV)
@ 2011-02-11  1:25 ` Venkateswararao Jujjuri (JV)
  2011-02-11  6:59   ` Aneesh Kumar K. V
  2011-02-11  1:25 ` [RFC-V3 4/7] [net/9p] Add gup/zero_copy support to VirtIO transport layer Venkateswararao Jujjuri (JV)
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11  1:25 UTC (permalink / raw)
  To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
 net/9p/protocol.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index d888847..5936c50 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -579,6 +579,7 @@ EXPORT_SYMBOL(p9stat_read);
 
 int p9pdu_prepare(struct p9_fcall *pdu, int16_t tag, int8_t type)
 {
+	pdu->id = type;
 	return p9pdu_writef(pdu, 0, "dbw", 0, type, tag);
 }
 
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC-V3 4/7] [net/9p] Add gup/zero_copy support to VirtIO transport layer.
  2011-02-11  1:25 [RFC-V3] [net/9p] [PATCH 0/7] Zero Copy Venkateswararao Jujjuri (JV)
                   ` (2 preceding siblings ...)
  2011-02-11  1:25 ` [RFC-V3 3/7] [net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed Venkateswararao Jujjuri (JV)
@ 2011-02-11  1:25 ` Venkateswararao Jujjuri (JV)
  2011-02-11  7:07   ` Aneesh Kumar K. V
  2011-02-11 17:58   ` Aneesh Kumar K. V
  2011-02-11  1:25 ` [RFC-V3 5/7] [net/9p] Add preferences to " Venkateswararao Jujjuri (JV)
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11  1:25 UTC (permalink / raw)
  To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
 net/9p/trans_virtio.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 101 insertions(+), 6 deletions(-)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index c8f3f72..e0c301f 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -45,6 +45,7 @@
 #include <linux/scatterlist.h>
 #include <linux/virtio.h>
 #include <linux/virtio_9p.h>
+#include "trans_common.h"
 
 #define VIRTQUEUE_NUM	128
 
@@ -155,6 +156,14 @@ static void req_done(struct virtqueue *vq)
 					rc->tag);
 			req = p9_tag_lookup(chan->client, rc->tag);
 			req->status = REQ_STATUS_RCVD;
+			if (req->tc->private) {
+				struct trans_rpage_info *rp = req->tc->private;
+				/*Release pages */
+				p9_release_req_pages(rp);
+				if (rp->rp_alloc)
+					kfree(rp);
+				req->tc->private = NULL;
+			}
 			p9_client_cb(chan->client, req);
 		} else {
 			spin_unlock_irqrestore(&chan->lock, flags);
@@ -202,6 +211,30 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
 	return 1;
 }
 
+static int
+pack_sg_list_p(struct scatterlist *sg, int start, int limit, size_t pdata_off,
+		struct page **pdata, int count)
+{
+	int s;
+	int i = 0;
+	int index = start;
+
+	if (pdata_off) {
+		s = min((int)(PAGE_SIZE - pdata_off), count);
+		sg_set_page(&sg[index++], pdata[i++], s, pdata_off);
+		count -= s;
+	}
+
+	while (count) {
+		BUG_ON(index > limit);
+		s = min((int)PAGE_SIZE, count);
+		sg_set_page(&sg[index++], pdata[i++], s, 0);
+		count -= s;
+	}
+
+	return index-start;
+}
+
 /**
  * p9_virtio_request - issue a request
  * @client: client instance issuing the request
@@ -212,22 +245,82 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
 static int
 p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
 {
-	int in, out;
+	int in, out, inp, outp;
 	struct virtio_chan *chan = client->trans;
 	char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
 	unsigned long flags;
-	int err;
+	size_t pdata_off = 0;
+	struct trans_rpage_info *rpinfo = NULL;
+	int err, pdata_len = 0;
 
 	P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
 
 req_retry:
 	req->status = REQ_STATUS_SENT;
 
+	if (req->tc->pbuf_size &&
+			(req->tc->pubuf && !segment_eq(get_fs(), KERNEL_DS))) {
+		int nr_pages = p9_nr_pages(req);
+		int rpinfo_size = sizeof(struct trans_rpage_info) +
+			sizeof(struct page *) * nr_pages;
+
+		if (rpinfo_size <= (req->tc->capacity - req->tc->size)) {
+			/* We can use sdata */
+			req->tc->private = req->tc->sdata + req->tc->size;
+			rpinfo = (struct trans_rpage_info *)req->tc->private;
+			rpinfo->rp_alloc = 0;
+		} else {
+			req->tc->private = kmalloc(rpinfo_size, GFP_KERNEL);
+			rpinfo = (struct trans_rpage_info *)req->tc->private;
+			rpinfo->rp_alloc = 1;
+		}
+
+		err = p9_payload_gup(req, &pdata_off, &pdata_len, nr_pages,
+				req->tc->id == P9_TREAD ? 1 : 0);
+		if (err < 0) {
+			if (rpinfo->rp_alloc)
+				kfree(rpinfo);
+			return err;
+		}
+	}
+
 	spin_lock_irqsave(&chan->lock, flags);
-	out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
-								req->tc->size);
-	in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
-								client->msize);
+
+	/* Handle out VirtIO ring buffers */
+	if (req->tc->pbuf_size && (req->tc->id == P9_TWRITE)) {
+		/* We have additional write payload buffer to take care */
+		out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
+				req->tc->size);
+		outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
+				pdata_off, rpinfo->rp_data, pdata_len);
+		out += outp;
+	} else {
+		out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
+				req->tc->size);
+	}
+
+	/* Handle in VirtIO ring buffers */
+	if (req->tc->pbuf_size && (req->tc->id == P9_TREAD)) {
+		/* We have additional Read payload buffer to take care */
+		inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
+		/*
+		 * Running executables in the filesystem may result in
+		 * a read request with kernel buffer as opposed to user buffer.
+		 */
+		if (req->tc->pubuf && !segment_eq(get_fs(), KERNEL_DS)) {
+			in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
+					pdata_off, rpinfo->rp_data, pdata_len);
+		} else {
+			char *pbuf = req->tc->pubuf ? req->tc->pubuf :
+								req->tc->pkbuf;
+			in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM,
+					pbuf, req->tc->pbuf_size);
+		}
+		in += inp;
+	} else {
+		in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
+				client->msize);
+	}
 
 	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc);
 	if (err < 0) {
@@ -246,6 +339,8 @@ req_retry:
 			P9_DPRINTK(P9_DEBUG_TRANS,
 					"9p debug: "
 					"virtio rpc add_buf returned failure");
+			if (rpinfo && rpinfo->rp_alloc)
+				kfree(rpinfo);
 			return -EIO;
 		}
 	}
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC-V3 5/7] [net/9p] Add preferences to transport layer.
  2011-02-11  1:25 [RFC-V3] [net/9p] [PATCH 0/7] Zero Copy Venkateswararao Jujjuri (JV)
                   ` (3 preceding siblings ...)
  2011-02-11  1:25 ` [RFC-V3 4/7] [net/9p] Add gup/zero_copy support to VirtIO transport layer Venkateswararao Jujjuri (JV)
@ 2011-02-11  1:25 ` Venkateswararao Jujjuri (JV)
  2011-02-11  1:25 ` [RFC-V3 6/7] [net/9p] Read and Write side zerocopy changes for 9P2000.L protocol Venkateswararao Jujjuri (JV)
  2011-02-11  1:25 ` [RFC-V3 7/7] [net/9p] Handle TREAD/RERROR case in !dotl case Venkateswararao Jujjuri (JV)
  6 siblings, 0 replies; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11  1:25 UTC (permalink / raw)
  To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

This patch adds preferences field to the p9_trans_module.
Through this, now transport layer can express its preference about the
payload. i.e if payload neds to be part of the PDU or it prefers it
to be sent sepearetly so that the transport layer can handle it in
a better way.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
 include/net/9p/transport.h |    9 +++++++++
 net/9p/trans_virtio.c      |    1 +
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/include/net/9p/transport.h b/include/net/9p/transport.h
index 6d5886e..82868f1 100644
--- a/include/net/9p/transport.h
+++ b/include/net/9p/transport.h
@@ -26,11 +26,19 @@
 #ifndef NET_9P_TRANSPORT_H
 #define NET_9P_TRANSPORT_H
 
+#define P9_TRANS_PREF_PAYLOAD_MASK 0x1
+
+/* Default. Add Payload to PDU before sending it down to transport layer */
+#define P9_TRANS_PREF_PAYLOAD_DEF  0x0
+/* Send pay load seperately to transport layer along with PDU.*/
+#define P9_TRANS_PREF_PAYLOAD_SEP  0x1
+
 /**
  * struct p9_trans_module - transport module interface
  * @list: used to maintain a list of currently available transports
  * @name: the human-readable name of the transport
  * @maxsize: transport provided maximum packet size
+ * @pref: Preferences of this transport
  * @def: set if this transport should be considered the default
  * @create: member function to create a new connection on this transport
  * @request: member function to issue a request to the transport
@@ -47,6 +55,7 @@ struct p9_trans_module {
 	struct list_head list;
 	char *name;		/* name of transport */
 	int maxsize;		/* max message size of transport */
+	int pref;               /* Preferences of this transport */
 	int def;		/* this transport should be default */
 	struct module *owner;
 	int (*create)(struct p9_client *, const char *, char *);
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index e0c301f..f75da37 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -543,6 +543,7 @@ static struct p9_trans_module p9_virtio_trans = {
 	.request = p9_virtio_request,
 	.cancel = p9_virtio_cancel,
 	.maxsize = PAGE_SIZE*16,
+	.pref = P9_TRANS_PREF_PAYLOAD_SEP,
 	.def = 0,
 	.owner = THIS_MODULE,
 };
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC-V3 6/7] [net/9p] Read and Write side zerocopy changes for 9P2000.L protocol.
  2011-02-11  1:25 [RFC-V3] [net/9p] [PATCH 0/7] Zero Copy Venkateswararao Jujjuri (JV)
                   ` (4 preceding siblings ...)
  2011-02-11  1:25 ` [RFC-V3 5/7] [net/9p] Add preferences to " Venkateswararao Jujjuri (JV)
@ 2011-02-11  1:25 ` Venkateswararao Jujjuri (JV)
  2011-02-11 19:35   ` Aneesh Kumar K. V
  2011-02-11  1:25 ` [RFC-V3 7/7] [net/9p] Handle TREAD/RERROR case in !dotl case Venkateswararao Jujjuri (JV)
  6 siblings, 1 reply; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11  1:25 UTC (permalink / raw)
  To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
 net/9p/client.c   |   47 +++++++++++++++++++++++++++++++++--------------
 net/9p/protocol.c |   21 +++++++++++++++++++++
 2 files changed, 54 insertions(+), 14 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index a848bca..f6d8531 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1270,7 +1270,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset,
 	if (count < rsize)
 		rsize = count;
 
-	req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize);
+	/* Don't bother zerocopy form small IO (< 1024) */
+	if (((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) ==
+			P9_TRANS_PREF_PAYLOAD_SEP) && (rsize > 1024)) {
+		req = p9_client_rpc(clnt, P9_TREAD, "dqE", fid->fid, offset,
+				rsize, data, udata);
+	} else {
+		req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset,
+				rsize);
+	}
 	if (IS_ERR(req)) {
 		err = PTR_ERR(req);
 		goto error;
@@ -1284,13 +1292,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset,
 
 	P9_DPRINTK(P9_DEBUG_9P, "<<< RREAD count %d\n", count);
 
-	if (data) {
-		memmove(data, dataptr, count);
-	} else {
-		err = copy_to_user(udata, dataptr, count);
-		if (err) {
-			err = -EFAULT;
-			goto free_and_error;
+	if (!req->tc->pbuf_size) {
+		if (data) {
+			memmove(data, dataptr, count);
+		} else {
+			err = copy_to_user(udata, dataptr, count);
+			if (err) {
+				err = -EFAULT;
+				goto free_and_error;
+			}
 		}
 	}
 	p9_free_req(clnt, req);
@@ -1323,12 +1333,21 @@ p9_client_write(struct p9_fid *fid, char *data, const char __user *udata,
 
 	if (count < rsize)
 		rsize = count;
-	if (data)
-		req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, offset,
-								rsize, data);
-	else
-		req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, offset,
-								rsize, udata);
+
+	/* Don't bother zerocopy form small IO (< 1024) */
+	if (((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) ==
+			P9_TRANS_PREF_PAYLOAD_SEP) && (rsize > 1024)) {
+		req = p9_client_rpc(clnt, P9_TWRITE, "dqE", fid->fid, offset,
+				rsize, data, udata);
+	} else {
+		if (data)
+			req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid,
+					offset, rsize, data);
+		else
+			req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid,
+					offset, rsize, udata);
+	}
+
 	if (IS_ERR(req)) {
 		err = PTR_ERR(req);
 		goto error;
diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index 5936c50..830b999 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -114,6 +114,17 @@ pdu_write_u(struct p9_fcall *pdu, const char __user *udata, size_t size)
 	return size - len;
 }
 
+static size_t
+pdu_write_urw(struct p9_fcall *pdu, const char *kdata, const char __user *udata,
+		size_t size)
+{
+	size_t len = min(pdu->capacity - pdu->size, size);
+	pdu->pubuf = (char __user *)udata;
+	pdu->pkbuf = (char *)kdata;
+	pdu->pbuf_size = len;
+	return size - len;
+}
+
 /*
 	b - int8_t
 	w - int16_t
@@ -445,6 +456,16 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
 					errcode = -EFAULT;
 			}
 			break;
+		case 'E':{
+				 int32_t cnt = va_arg(ap, int32_t);
+				 const char *k = va_arg(ap, const void *);
+				 const char *u = va_arg(ap, const void *);
+				 errcode = p9pdu_writef(pdu, proto_version, "d",
+						 cnt);
+				 if (!errcode && pdu_write_urw(pdu, k, u, cnt))
+					errcode = -EFAULT;
+			 }
+			 break;
 		case 'U':{
 				int32_t count = va_arg(ap, int32_t);
 				const char __user *udata =
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC-V3 7/7] [net/9p] Handle TREAD/RERROR case in !dotl case.
  2011-02-11  1:25 [RFC-V3] [net/9p] [PATCH 0/7] Zero Copy Venkateswararao Jujjuri (JV)
                   ` (5 preceding siblings ...)
  2011-02-11  1:25 ` [RFC-V3 6/7] [net/9p] Read and Write side zerocopy changes for 9P2000.L protocol Venkateswararao Jujjuri (JV)
@ 2011-02-11  1:25 ` Venkateswararao Jujjuri (JV)
  2011-02-11 18:10   ` Aneesh Kumar K. V
  6 siblings, 1 reply; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11  1:25 UTC (permalink / raw)
  To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

In addition, this patch also avoids zero copy for short reads in !dotl case.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
 net/9p/client.c |   64 +++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index f6d8531..3e51273 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -443,6 +443,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
 {
 	int8_t type;
 	int err;
+	int ecode;
 
 	err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
 	if (err) {
@@ -450,36 +451,53 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
 		return err;
 	}
 
-	if (type == P9_RERROR || type == P9_RLERROR) {
-		int ecode;
-
-		if (!p9_is_proto_dotl(c)) {
-			char *ename;
+	if (type != P9_RERROR && type != P9_RLERROR)
+		return 0;
 
-			err = p9pdu_readf(req->rc, c->proto_version, "s?d",
-								&ename, &ecode);
-			if (err)
-				goto out_err;
+	if (!p9_is_proto_dotl(c)) {
+		char *ename;
+
+		if (req->tc->pbuf_size) {
+			/* Handle user buffers */
+			size_t len = req->rc->size - req->rc->offset;
+			if (req->tc->pubuf) {
+				/* User Buffer */
+				err = copy_from_user(
+					&req->rc->sdata[req->rc->offset],
+					req->tc->pubuf, len);
+				if (err) {
+					err = -EFAULT;
+					return err;
+				}
+			} else {
+				/* Kernel Buffer */
+				memmove(&req->rc->sdata[req->rc->offset],
+						req->tc->pkbuf, len);
+			}
+		}
+		err = p9pdu_readf(req->rc, c->proto_version, "s?d",
+				&ename, &ecode);
+		if (err)
+			goto out_err;
 
-			if (p9_is_proto_dotu(c))
-				err = -ecode;
+		if (p9_is_proto_dotu(c))
+			err = -ecode;
 
-			if (!err || !IS_ERR_VALUE(err)) {
-				err = p9_errstr2errno(ename, strlen(ename));
+		if (!err || !IS_ERR_VALUE(err)) {
+			err = p9_errstr2errno(ename, strlen(ename));
 
-				P9_DPRINTK(P9_DEBUG_9P, "<<< RERROR (%d) %s\n", -ecode, ename);
+			P9_DPRINTK(P9_DEBUG_9P, "<<< RERROR (%d) %s\n", -ecode,
+					ename);
 
-				kfree(ename);
-			}
-		} else {
-			err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
-			err = -ecode;
-
-			P9_DPRINTK(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
+			kfree(ename);
 		}
+	} else {
+		err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
+		err = -ecode;
+
+		P9_DPRINTK(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
+	}
 
-	} else
-		err = 0;
 
 	return err;
 
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC-V3 3/7] [net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed.
  2011-02-11  1:25 ` [RFC-V3 3/7] [net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed Venkateswararao Jujjuri (JV)
@ 2011-02-11  6:59   ` Aneesh Kumar K. V
  0 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K. V @ 2011-02-11  6:59 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV), v9fs-developer
  Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

On Thu, 10 Feb 2011 17:25:07 -0800, "Venkateswararao Jujjuri (JV)" <jvrao@linux.vnet.ibm.com> wrote:
> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>


Can you add extra commit info stating we are going to use the pdu->id
later in the patch. 

> ---
>  net/9p/protocol.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/net/9p/protocol.c b/net/9p/protocol.c
> index d888847..5936c50 100644
> --- a/net/9p/protocol.c
> +++ b/net/9p/protocol.c
> @@ -579,6 +579,7 @@ EXPORT_SYMBOL(p9stat_read);
> 
>  int p9pdu_prepare(struct p9_fcall *pdu, int16_t tag, int8_t type)
>  {
> +	pdu->id = type;
>  	return p9pdu_writef(pdu, 0, "dbw", 0, type, tag);
>  }
> 
> -- 
> 1.6.5.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC-V3 4/7] [net/9p] Add gup/zero_copy support to VirtIO transport layer.
  2011-02-11  1:25 ` [RFC-V3 4/7] [net/9p] Add gup/zero_copy support to VirtIO transport layer Venkateswararao Jujjuri (JV)
@ 2011-02-11  7:07   ` Aneesh Kumar K. V
  2011-02-11 16:08     ` Venkateswararao Jujjuri (JV)
  2011-02-11 17:58   ` Aneesh Kumar K. V
  1 sibling, 1 reply; 17+ messages in thread
From: Aneesh Kumar K. V @ 2011-02-11  7:07 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV), v9fs-developer
  Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

On Thu, 10 Feb 2011 17:25:08 -0800, "Venkateswararao Jujjuri (JV)" <jvrao@linux.vnet.ibm.com> wrote:
> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>

Can we remove the checks for request type in p9_virtio_request. Current
we have

p9_client_read()
{
        if (zero coy enabled) {
                     ...
        } else {
                .....
        }

}

Then again in p9_virtio_request

p9_virtio_request()
{
                if (request->id == P9_TREAD)
         


}

It would be nice if we can avoid doing that request->id check in
p9_virtio_request. So that if we want to add zero copy for any other
type of request we won't need to change p9_virtio_rquest.

-aneesh

> ---
>  net/9p/trans_virtio.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 101 insertions(+), 6 deletions(-)
> 
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index c8f3f72..e0c301f 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -45,6 +45,7 @@
>  #include <linux/scatterlist.h>
>  #include <linux/virtio.h>
>  #include <linux/virtio_9p.h>
> +#include "trans_common.h"
> 
>  #define VIRTQUEUE_NUM	128
> 
> @@ -155,6 +156,14 @@ static void req_done(struct virtqueue *vq)
>  					rc->tag);
>  			req = p9_tag_lookup(chan->client, rc->tag);
>  			req->status = REQ_STATUS_RCVD;
> +			if (req->tc->private) {
> +				struct trans_rpage_info *rp = req->tc->private;
> +				/*Release pages */
> +				p9_release_req_pages(rp);
> +				if (rp->rp_alloc)
> +					kfree(rp);
> +				req->tc->private = NULL;
> +			}
>  			p9_client_cb(chan->client, req);
>  		} else {
>  			spin_unlock_irqrestore(&chan->lock, flags);
> @@ -202,6 +211,30 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>  	return 1;
>  }
> 
> +static int
> +pack_sg_list_p(struct scatterlist *sg, int start, int limit, size_t pdata_off,
> +		struct page **pdata, int count)
> +{
> +	int s;
> +	int i = 0;
> +	int index = start;
> +
> +	if (pdata_off) {
> +		s = min((int)(PAGE_SIZE - pdata_off), count);
> +		sg_set_page(&sg[index++], pdata[i++], s, pdata_off);
> +		count -= s;
> +	}
> +
> +	while (count) {
> +		BUG_ON(index > limit);
> +		s = min((int)PAGE_SIZE, count);
> +		sg_set_page(&sg[index++], pdata[i++], s, 0);
> +		count -= s;
> +	}
> +
> +	return index-start;
> +}
> +
>  /**
>   * p9_virtio_request - issue a request
>   * @client: client instance issuing the request
> @@ -212,22 +245,82 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>  static int
>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>  {
> -	int in, out;
> +	int in, out, inp, outp;
>  	struct virtio_chan *chan = client->trans;
>  	char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>  	unsigned long flags;
> -	int err;
> +	size_t pdata_off = 0;
> +	struct trans_rpage_info *rpinfo = NULL;
> +	int err, pdata_len = 0;
> 
>  	P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
> 
>  req_retry:
>  	req->status = REQ_STATUS_SENT;
> 
> +	if (req->tc->pbuf_size &&
> +			(req->tc->pubuf && !segment_eq(get_fs(), KERNEL_DS))) {
> +		int nr_pages = p9_nr_pages(req);
> +		int rpinfo_size = sizeof(struct trans_rpage_info) +
> +			sizeof(struct page *) * nr_pages;
> +
> +		if (rpinfo_size <= (req->tc->capacity - req->tc->size)) {
> +			/* We can use sdata */
> +			req->tc->private = req->tc->sdata + req->tc->size;
> +			rpinfo = (struct trans_rpage_info *)req->tc->private;
> +			rpinfo->rp_alloc = 0;
> +		} else {
> +			req->tc->private = kmalloc(rpinfo_size, GFP_KERNEL);
> +			rpinfo = (struct trans_rpage_info *)req->tc->private;
> +			rpinfo->rp_alloc = 1;
> +		}
> +
> +		err = p9_payload_gup(req, &pdata_off, &pdata_len, nr_pages,
> +				req->tc->id == P9_TREAD ? 1 : 0);
> +		if (err < 0) {
> +			if (rpinfo->rp_alloc)
> +				kfree(rpinfo);
> +			return err;
> +		}
> +	}
> +
>  	spin_lock_irqsave(&chan->lock, flags);
> -	out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
> -								req->tc->size);
> -	in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
> -								client->msize);
> +
> +	/* Handle out VirtIO ring buffers */
> +	if (req->tc->pbuf_size && (req->tc->id == P9_TWRITE)) {
> +		/* We have additional write payload buffer to take care */
> +		out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
> +				req->tc->size);
> +		outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
> +				pdata_off, rpinfo->rp_data, pdata_len);
> +		out += outp;
> +	} else {
> +		out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
> +				req->tc->size);
> +	}
> +
> +	/* Handle in VirtIO ring buffers */
> +	if (req->tc->pbuf_size && (req->tc->id == P9_TREAD)) {
> +		/* We have additional Read payload buffer to take care */
> +		inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
> +		/*
> +		 * Running executables in the filesystem may result in
> +		 * a read request with kernel buffer as opposed to user buffer.
> +		 */
> +		if (req->tc->pubuf && !segment_eq(get_fs(), KERNEL_DS)) {
> +			in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
> +					pdata_off, rpinfo->rp_data, pdata_len);
> +		} else {
> +			char *pbuf = req->tc->pubuf ? req->tc->pubuf :
> +								req->tc->pkbuf;
> +			in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM,
> +					pbuf, req->tc->pbuf_size);
> +		}
> +		in += inp;
> +	} else {
> +		in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
> +				client->msize);
> +	}
> 
>  	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc);
>  	if (err < 0) {
> @@ -246,6 +339,8 @@ req_retry:
>  			P9_DPRINTK(P9_DEBUG_TRANS,
>  					"9p debug: "
>  					"virtio rpc add_buf returned failure");
> +			if (rpinfo && rpinfo->rp_alloc)
> +				kfree(rpinfo);
>  			return -EIO;
>  		}
>  	}
> -- 
> 1.6.5.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC-V3 4/7] [net/9p] Add gup/zero_copy support to VirtIO transport layer.
  2011-02-11  7:07   ` Aneesh Kumar K. V
@ 2011-02-11 16:08     ` Venkateswararao Jujjuri (JV)
  0 siblings, 0 replies; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11 16:08 UTC (permalink / raw)
  To: Aneesh Kumar K. V; +Cc: v9fs-developer, linux-fsdevel

On 2/10/2011 11:07 PM, Aneesh Kumar K. V wrote:
> On Thu, 10 Feb 2011 17:25:08 -0800, "Venkateswararao Jujjuri (JV)" <jvrao@linux.vnet.ibm.com> wrote:
>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
> 
> Can we remove the checks for request type in p9_virtio_request. Current
> we have
> 
> p9_client_read()
> {
>         if (zero coy enabled) {
>                      ...
>         } else {
>                 .....
>         }
> 
> }
> 
> Then again in p9_virtio_request
> 
> p9_virtio_request()
> {
>                 if (request->id == P9_TREAD)
>          
> 
> 
> }
> 
> It would be nice if we can avoid doing that request->id check in
> p9_virtio_request. So that if we want to add zero copy for any other
> type of request we won't need to change p9_virtio_rquest.

We need to know READ/WRITE information to choose where to place the buffers (in/out)
on the VirtIO queue. My initial proposal is to add a flag in the  p9_fcall
instead of checking id.
But, Eric asked me to do this way because apparently it is fine to look at id
information in the
transport layer.

- JV

> 
> -aneesh
> 
>> ---
>>  net/9p/trans_virtio.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 files changed, 101 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>> index c8f3f72..e0c301f 100644
>> --- a/net/9p/trans_virtio.c
>> +++ b/net/9p/trans_virtio.c
>> @@ -45,6 +45,7 @@
>>  #include <linux/scatterlist.h>
>>  #include <linux/virtio.h>
>>  #include <linux/virtio_9p.h>
>> +#include "trans_common.h"
>>
>>  #define VIRTQUEUE_NUM	128
>>
>> @@ -155,6 +156,14 @@ static void req_done(struct virtqueue *vq)
>>  					rc->tag);
>>  			req = p9_tag_lookup(chan->client, rc->tag);
>>  			req->status = REQ_STATUS_RCVD;
>> +			if (req->tc->private) {
>> +				struct trans_rpage_info *rp = req->tc->private;
>> +				/*Release pages */
>> +				p9_release_req_pages(rp);
>> +				if (rp->rp_alloc)
>> +					kfree(rp);
>> +				req->tc->private = NULL;
>> +			}
>>  			p9_client_cb(chan->client, req);
>>  		} else {
>>  			spin_unlock_irqrestore(&chan->lock, flags);
>> @@ -202,6 +211,30 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>  	return 1;
>>  }
>>
>> +static int
>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit, size_t pdata_off,
>> +		struct page **pdata, int count)
>> +{
>> +	int s;
>> +	int i = 0;
>> +	int index = start;
>> +
>> +	if (pdata_off) {
>> +		s = min((int)(PAGE_SIZE - pdata_off), count);
>> +		sg_set_page(&sg[index++], pdata[i++], s, pdata_off);
>> +		count -= s;
>> +	}
>> +
>> +	while (count) {
>> +		BUG_ON(index > limit);
>> +		s = min((int)PAGE_SIZE, count);
>> +		sg_set_page(&sg[index++], pdata[i++], s, 0);
>> +		count -= s;
>> +	}
>> +
>> +	return index-start;
>> +}
>> +
>>  /**
>>   * p9_virtio_request - issue a request
>>   * @client: client instance issuing the request
>> @@ -212,22 +245,82 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>  static int
>>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>>  {
>> -	int in, out;
>> +	int in, out, inp, outp;
>>  	struct virtio_chan *chan = client->trans;
>>  	char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>>  	unsigned long flags;
>> -	int err;
>> +	size_t pdata_off = 0;
>> +	struct trans_rpage_info *rpinfo = NULL;
>> +	int err, pdata_len = 0;
>>
>>  	P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>>
>>  req_retry:
>>  	req->status = REQ_STATUS_SENT;
>>
>> +	if (req->tc->pbuf_size &&
>> +			(req->tc->pubuf && !segment_eq(get_fs(), KERNEL_DS))) {
>> +		int nr_pages = p9_nr_pages(req);
>> +		int rpinfo_size = sizeof(struct trans_rpage_info) +
>> +			sizeof(struct page *) * nr_pages;
>> +
>> +		if (rpinfo_size <= (req->tc->capacity - req->tc->size)) {
>> +			/* We can use sdata */
>> +			req->tc->private = req->tc->sdata + req->tc->size;
>> +			rpinfo = (struct trans_rpage_info *)req->tc->private;
>> +			rpinfo->rp_alloc = 0;
>> +		} else {
>> +			req->tc->private = kmalloc(rpinfo_size, GFP_KERNEL);
>> +			rpinfo = (struct trans_rpage_info *)req->tc->private;
>> +			rpinfo->rp_alloc = 1;
>> +		}
>> +
>> +		err = p9_payload_gup(req, &pdata_off, &pdata_len, nr_pages,
>> +				req->tc->id == P9_TREAD ? 1 : 0);
>> +		if (err < 0) {
>> +			if (rpinfo->rp_alloc)
>> +				kfree(rpinfo);
>> +			return err;
>> +		}
>> +	}
>> +
>>  	spin_lock_irqsave(&chan->lock, flags);
>> -	out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>> -								req->tc->size);
>> -	in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
>> -								client->msize);
>> +
>> +	/* Handle out VirtIO ring buffers */
>> +	if (req->tc->pbuf_size && (req->tc->id == P9_TWRITE)) {
>> +		/* We have additional write payload buffer to take care */
>> +		out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>> +				req->tc->size);
>> +		outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
>> +				pdata_off, rpinfo->rp_data, pdata_len);
>> +		out += outp;
>> +	} else {
>> +		out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>> +				req->tc->size);
>> +	}
>> +
>> +	/* Handle in VirtIO ring buffers */
>> +	if (req->tc->pbuf_size && (req->tc->id == P9_TREAD)) {
>> +		/* We have additional Read payload buffer to take care */
>> +		inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
>> +		/*
>> +		 * Running executables in the filesystem may result in
>> +		 * a read request with kernel buffer as opposed to user buffer.
>> +		 */
>> +		if (req->tc->pubuf && !segment_eq(get_fs(), KERNEL_DS)) {
>> +			in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
>> +					pdata_off, rpinfo->rp_data, pdata_len);
>> +		} else {
>> +			char *pbuf = req->tc->pubuf ? req->tc->pubuf :
>> +								req->tc->pkbuf;
>> +			in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM,
>> +					pbuf, req->tc->pbuf_size);
>> +		}
>> +		in += inp;
>> +	} else {
>> +		in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>> +				client->msize);
>> +	}
>>
>>  	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc);
>>  	if (err < 0) {
>> @@ -246,6 +339,8 @@ req_retry:
>>  			P9_DPRINTK(P9_DEBUG_TRANS,
>>  					"9p debug: "
>>  					"virtio rpc add_buf returned failure");
>> +			if (rpinfo && rpinfo->rp_alloc)
>> +				kfree(rpinfo);
>>  			return -EIO;
>>  		}
>>  	}
>> -- 
>> 1.6.5.2
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC-V3 4/7] [net/9p] Add gup/zero_copy support to VirtIO transport layer.
  2011-02-11  1:25 ` [RFC-V3 4/7] [net/9p] Add gup/zero_copy support to VirtIO transport layer Venkateswararao Jujjuri (JV)
  2011-02-11  7:07   ` Aneesh Kumar K. V
@ 2011-02-11 17:58   ` Aneesh Kumar K. V
  2011-02-11 18:42     ` Venkateswararao Jujjuri (JV)
  1 sibling, 1 reply; 17+ messages in thread
From: Aneesh Kumar K. V @ 2011-02-11 17:58 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV), v9fs-developer
  Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

On Thu, 10 Feb 2011 17:25:08 -0800, "Venkateswararao Jujjuri (JV)" <jvrao@linux.vnet.ibm.com> wrote:
> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
> ---
>  net/9p/trans_virtio.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 101 insertions(+), 6 deletions(-)
> 
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index c8f3f72..e0c301f 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -45,6 +45,7 @@
>  #include <linux/scatterlist.h>
>  #include <linux/virtio.h>
>  #include <linux/virtio_9p.h>
> +#include "trans_common.h"
> 
>  #define VIRTQUEUE_NUM	128
> 
> @@ -155,6 +156,14 @@ static void req_done(struct virtqueue *vq)
>  					rc->tag);
>  			req = p9_tag_lookup(chan->client, rc->tag);
>  			req->status = REQ_STATUS_RCVD;
> +			if (req->tc->private) {
> +				struct trans_rpage_info *rp = req->tc->private;
> +				/*Release pages */
> +				p9_release_req_pages(rp);
> +				if (rp->rp_alloc)
> +					kfree(rp);
> +				req->tc->private = NULL;
> +			}
>  			p9_client_cb(chan->client, req);
>  		} else {
>  			spin_unlock_irqrestore(&chan->lock, flags);
> @@ -202,6 +211,30 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>  	return 1;
>  }
> 
> +static int
> +pack_sg_list_p(struct scatterlist *sg, int start, int limit, size_t pdata_off,
> +		struct page **pdata, int count)
> +{
> +	int s;
> +	int i = 0;
> +	int index = start;
> +
> +	if (pdata_off) {
> +		s = min((int)(PAGE_SIZE - pdata_off), count);
> +		sg_set_page(&sg[index++], pdata[i++], s, pdata_off);
> +		count -= s;
> +	}
> +
> +	while (count) {
> +		BUG_ON(index > limit);
> +		s = min((int)PAGE_SIZE, count);
> +		sg_set_page(&sg[index++], pdata[i++], s, 0);
> +		count -= s;
> +	}
> +
> +	return index-start;
> +}
> +
>  /**
>   * p9_virtio_request - issue a request
>   * @client: client instance issuing the request
> @@ -212,22 +245,82 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>  static int
>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>  {
> -	int in, out;
> +	int in, out, inp, outp;
>  	struct virtio_chan *chan = client->trans;
>  	char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>  	unsigned long flags;
> -	int err;
> +	size_t pdata_off = 0;
> +	struct trans_rpage_info *rpinfo = NULL;
> +	int err, pdata_len = 0;
> 
>  	P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
> 
>  req_retry:
>  	req->status = REQ_STATUS_SENT;
> 
> +	if (req->tc->pbuf_size &&
> +			(req->tc->pubuf && !segment_eq(get_fs(), KERNEL_DS))) {
> +		int nr_pages = p9_nr_pages(req);
> +		int rpinfo_size = sizeof(struct trans_rpage_info) +
> +			sizeof(struct page *) * nr_pages;
> +
> +		if (rpinfo_size <= (req->tc->capacity - req->tc->size)) {
> +			/* We can use sdata */
> +			req->tc->private = req->tc->sdata + req->tc->size;
> +			rpinfo = (struct trans_rpage_info *)req->tc->private;
> +			rpinfo->rp_alloc = 0;
> +		} else {
> +			req->tc->private = kmalloc(rpinfo_size, GFP_KERNEL);
> +			rpinfo = (struct trans_rpage_info *)req->tc->private;
> +			rpinfo->rp_alloc = 1;
> +		}
> +
> +		err = p9_payload_gup(req, &pdata_off, &pdata_len, nr_pages,
> +				req->tc->id == P9_TREAD ? 1 : 0);
> +		if (err < 0) {
> +			if (rpinfo->rp_alloc)
> +				kfree(rpinfo);
> +			return err;
> +		}
> +	}
> +
>  	spin_lock_irqsave(&chan->lock, flags);
> -	out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
> -								req->tc->size);
> -	in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
> -								client->msize);
> +
> +	/* Handle out VirtIO ring buffers */
> +	if (req->tc->pbuf_size && (req->tc->id == P9_TWRITE)) {
> +		/* We have additional write payload buffer to take care */
> +		out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
> +				req->tc->size);
> +		outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
> +				pdata_off, rpinfo->rp_data, pdata_len);
> +		out += outp;
> +	} else {
> +		out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
> +				req->tc->size);
> +	}

How about  ?

  out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
  /*
   * We have additional write payload buffer to take care
   * in case of zero copy and the request is TWRITE
   */
   if (req->tc->pbuf_size && (req->tc->id == P9_TWRITE)) {
          outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
				pdata_off, rpinfo->rp_data, pdata_len);
	  out += outp;
   }


> +
> +	/* Handle in VirtIO ring buffers */
> +	if (req->tc->pbuf_size && (req->tc->id == P9_TREAD)) {
> +		/* We have additional Read payload buffer to take care */
> +		inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
> +		/*
> +		 * Running executables in the filesystem may result in
> +		 * a read request with kernel buffer as opposed to user buffer.
> +		 */
> +		if (req->tc->pubuf && !segment_eq(get_fs(), KERNEL_DS)) {
> +			in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
> +					pdata_off, rpinfo->rp_data, pdata_len);
> +		} else {
> +			char *pbuf = req->tc->pubuf ? req->tc->pubuf :
> +								req->tc->pkbuf;
> +			in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM,
> +					pbuf, req->tc->pbuf_size);
> +		}
> +		in += inp;
> +	} else {
> +		in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
> +				client->msize);
> +	}
> 
>  	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc);
>  	if (err < 0) {
> @@ -246,6 +339,8 @@ req_retry:
>  			P9_DPRINTK(P9_DEBUG_TRANS,
>  					"9p debug: "
>  					"virtio rpc add_buf returned failure");
> +			if (rpinfo && rpinfo->rp_alloc)
> +				kfree(rpinfo);
>  			return -EIO;
>  		}
>  	}

-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC-V3 7/7] [net/9p] Handle TREAD/RERROR case in !dotl case.
  2011-02-11  1:25 ` [RFC-V3 7/7] [net/9p] Handle TREAD/RERROR case in !dotl case Venkateswararao Jujjuri (JV)
@ 2011-02-11 18:10   ` Aneesh Kumar K. V
  2011-02-11 18:46     ` Venkateswararao Jujjuri (JV)
  0 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K. V @ 2011-02-11 18:10 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV), v9fs-developer
  Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

On Thu, 10 Feb 2011 17:25:11 -0800, "Venkateswararao Jujjuri (JV)" <jvrao@linux.vnet.ibm.com> wrote:
> In addition, this patch also avoids zero copy for short reads in !dotl case.
> 
> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
> ---
>  net/9p/client.c |   64 +++++++++++++++++++++++++++++++++++-------------------
>  1 files changed, 41 insertions(+), 23 deletions(-)
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index f6d8531..3e51273 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -443,6 +443,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
>  {
>  	int8_t type;
>  	int err;
> +	int ecode;
> 
>  	err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
>  	if (err) {
> @@ -450,36 +451,53 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
>  		return err;
>  	}
> 
> -	if (type == P9_RERROR || type == P9_RLERROR) {
> -		int ecode;
> -
> -		if (!p9_is_proto_dotl(c)) {
> -			char *ename;
> +	if (type != P9_RERROR && type != P9_RLERROR)
> +		return 0;
> 
> -			err = p9pdu_readf(req->rc, c->proto_version, "s?d",
> -								&ename, &ecode);
> -			if (err)
> -				goto out_err;
> +	if (!p9_is_proto_dotl(c)) {
> +		char *ename;
> +
> +		if (req->tc->pbuf_size) {
> +			/* Handle user buffers */
> +			size_t len = req->rc->size - req->rc->offset;
> +			if (req->tc->pubuf) {
> +				/* User Buffer */
> +				err = copy_from_user(
> +					&req->rc->sdata[req->rc->offset],
> +					req->tc->pubuf, len);
> +				if (err) {
> +					err = -EFAULT;
> +					return err;
> +				}

Will this handle error resulting from kernel_read ?. I guess we have a
kernel address there.


> +			} else {
> +				/* Kernel Buffer */
> +				memmove(&req->rc->sdata[req->rc->offset],
> +						req->tc->pkbuf, len);
> +			}
> +		}

-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC-V3 4/7] [net/9p] Add gup/zero_copy support to VirtIO transport layer.
  2011-02-11 17:58   ` Aneesh Kumar K. V
@ 2011-02-11 18:42     ` Venkateswararao Jujjuri (JV)
  0 siblings, 0 replies; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11 18:42 UTC (permalink / raw)
  To: Aneesh Kumar K. V; +Cc: v9fs-developer, linux-fsdevel

On 2/11/2011 9:58 AM, Aneesh Kumar K. V wrote:
> On Thu, 10 Feb 2011 17:25:08 -0800, "Venkateswararao Jujjuri (JV)" <jvrao@linux.vnet.ibm.com> wrote:
>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>> ---
>>  net/9p/trans_virtio.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 files changed, 101 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>> index c8f3f72..e0c301f 100644
>> --- a/net/9p/trans_virtio.c
>> +++ b/net/9p/trans_virtio.c
>> @@ -45,6 +45,7 @@
>>  #include <linux/scatterlist.h>
>>  #include <linux/virtio.h>
>>  #include <linux/virtio_9p.h>
>> +#include "trans_common.h"
>>
>>  #define VIRTQUEUE_NUM	128
>>
>> @@ -155,6 +156,14 @@ static void req_done(struct virtqueue *vq)
>>  					rc->tag);
>>  			req = p9_tag_lookup(chan->client, rc->tag);
>>  			req->status = REQ_STATUS_RCVD;
>> +			if (req->tc->private) {
>> +				struct trans_rpage_info *rp = req->tc->private;
>> +				/*Release pages */
>> +				p9_release_req_pages(rp);
>> +				if (rp->rp_alloc)
>> +					kfree(rp);
>> +				req->tc->private = NULL;
>> +			}
>>  			p9_client_cb(chan->client, req);
>>  		} else {
>>  			spin_unlock_irqrestore(&chan->lock, flags);
>> @@ -202,6 +211,30 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>  	return 1;
>>  }
>>
>> +static int
>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit, size_t pdata_off,
>> +		struct page **pdata, int count)
>> +{
>> +	int s;
>> +	int i = 0;
>> +	int index = start;
>> +
>> +	if (pdata_off) {
>> +		s = min((int)(PAGE_SIZE - pdata_off), count);
>> +		sg_set_page(&sg[index++], pdata[i++], s, pdata_off);
>> +		count -= s;
>> +	}
>> +
>> +	while (count) {
>> +		BUG_ON(index > limit);
>> +		s = min((int)PAGE_SIZE, count);
>> +		sg_set_page(&sg[index++], pdata[i++], s, 0);
>> +		count -= s;
>> +	}
>> +
>> +	return index-start;
>> +}
>> +
>>  /**
>>   * p9_virtio_request - issue a request
>>   * @client: client instance issuing the request
>> @@ -212,22 +245,82 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>  static int
>>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>>  {
>> -	int in, out;
>> +	int in, out, inp, outp;
>>  	struct virtio_chan *chan = client->trans;
>>  	char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>>  	unsigned long flags;
>> -	int err;
>> +	size_t pdata_off = 0;
>> +	struct trans_rpage_info *rpinfo = NULL;
>> +	int err, pdata_len = 0;
>>
>>  	P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>>
>>  req_retry:
>>  	req->status = REQ_STATUS_SENT;
>>
>> +	if (req->tc->pbuf_size &&
>> +			(req->tc->pubuf && !segment_eq(get_fs(), KERNEL_DS))) {
>> +		int nr_pages = p9_nr_pages(req);
>> +		int rpinfo_size = sizeof(struct trans_rpage_info) +
>> +			sizeof(struct page *) * nr_pages;
>> +
>> +		if (rpinfo_size <= (req->tc->capacity - req->tc->size)) {
>> +			/* We can use sdata */
>> +			req->tc->private = req->tc->sdata + req->tc->size;
>> +			rpinfo = (struct trans_rpage_info *)req->tc->private;
>> +			rpinfo->rp_alloc = 0;
>> +		} else {
>> +			req->tc->private = kmalloc(rpinfo_size, GFP_KERNEL);
>> +			rpinfo = (struct trans_rpage_info *)req->tc->private;
>> +			rpinfo->rp_alloc = 1;
>> +		}
>> +
>> +		err = p9_payload_gup(req, &pdata_off, &pdata_len, nr_pages,
>> +				req->tc->id == P9_TREAD ? 1 : 0);
>> +		if (err < 0) {
>> +			if (rpinfo->rp_alloc)
>> +				kfree(rpinfo);
>> +			return err;
>> +		}
>> +	}
>> +
>>  	spin_lock_irqsave(&chan->lock, flags);
>> -	out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>> -								req->tc->size);
>> -	in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
>> -								client->msize);
>> +
>> +	/* Handle out VirtIO ring buffers */
>> +	if (req->tc->pbuf_size && (req->tc->id == P9_TWRITE)) {
>> +		/* We have additional write payload buffer to take care */
>> +		out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>> +				req->tc->size);
>> +		outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
>> +				pdata_off, rpinfo->rp_data, pdata_len);
>> +		out += outp;
>> +	} else {
>> +		out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>> +				req->tc->size);
>> +	}
> 
> How about  ?

Yes, This is something I wanted to do but missed out. Good catch.

Thanks,
JV

> 
>   out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
>   /*
>    * We have additional write payload buffer to take care
>    * in case of zero copy and the request is TWRITE
>    */
>    if (req->tc->pbuf_size && (req->tc->id == P9_TWRITE)) {
>           outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
> 				pdata_off, rpinfo->rp_data, pdata_len);
> 	  out += outp;
>    }
> 
> 
>> +
>> +	/* Handle in VirtIO ring buffers */
>> +	if (req->tc->pbuf_size && (req->tc->id == P9_TREAD)) {
>> +		/* We have additional Read payload buffer to take care */
>> +		inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
>> +		/*
>> +		 * Running executables in the filesystem may result in
>> +		 * a read request with kernel buffer as opposed to user buffer.
>> +		 */
>> +		if (req->tc->pubuf && !segment_eq(get_fs(), KERNEL_DS)) {
>> +			in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
>> +					pdata_off, rpinfo->rp_data, pdata_len);
>> +		} else {
>> +			char *pbuf = req->tc->pubuf ? req->tc->pubuf :
>> +								req->tc->pkbuf;
>> +			in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM,
>> +					pbuf, req->tc->pbuf_size);
>> +		}
>> +		in += inp;
>> +	} else {
>> +		in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>> +				client->msize);
>> +	}
>>
>>  	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc);
>>  	if (err < 0) {
>> @@ -246,6 +339,8 @@ req_retry:
>>  			P9_DPRINTK(P9_DEBUG_TRANS,
>>  					"9p debug: "
>>  					"virtio rpc add_buf returned failure");
>> +			if (rpinfo && rpinfo->rp_alloc)
>> +				kfree(rpinfo);
>>  			return -EIO;
>>  		}
>>  	}
> 
> -aneesh



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC-V3 7/7] [net/9p] Handle TREAD/RERROR case in !dotl case.
  2011-02-11 18:10   ` Aneesh Kumar K. V
@ 2011-02-11 18:46     ` Venkateswararao Jujjuri (JV)
  0 siblings, 0 replies; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11 18:46 UTC (permalink / raw)
  To: Aneesh Kumar K. V; +Cc: v9fs-developer, linux-fsdevel

On 2/11/2011 10:10 AM, Aneesh Kumar K. V wrote:
> On Thu, 10 Feb 2011 17:25:11 -0800, "Venkateswararao Jujjuri (JV)" <jvrao@linux.vnet.ibm.com> wrote:
>> In addition, this patch also avoids zero copy for short reads in !dotl case.
>>
>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>> ---
>>  net/9p/client.c |   64 +++++++++++++++++++++++++++++++++++-------------------
>>  1 files changed, 41 insertions(+), 23 deletions(-)
>>
>> diff --git a/net/9p/client.c b/net/9p/client.c
>> index f6d8531..3e51273 100644
>> --- a/net/9p/client.c
>> +++ b/net/9p/client.c
>> @@ -443,6 +443,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
>>  {
>>  	int8_t type;
>>  	int err;
>> +	int ecode;
>>
>>  	err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
>>  	if (err) {
>> @@ -450,36 +451,53 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
>>  		return err;
>>  	}
>>
>> -	if (type == P9_RERROR || type == P9_RLERROR) {
>> -		int ecode;
>> -
>> -		if (!p9_is_proto_dotl(c)) {
>> -			char *ename;
>> +	if (type != P9_RERROR && type != P9_RLERROR)
>> +		return 0;
>>
>> -			err = p9pdu_readf(req->rc, c->proto_version, "s?d",
>> -								&ename, &ecode);
>> -			if (err)
>> -				goto out_err;
>> +	if (!p9_is_proto_dotl(c)) {
>> +		char *ename;
>> +
>> +		if (req->tc->pbuf_size) {
>> +			/* Handle user buffers */
>> +			size_t len = req->rc->size - req->rc->offset;
>> +			if (req->tc->pubuf) {
>> +				/* User Buffer */
>> +				err = copy_from_user(
>> +					&req->rc->sdata[req->rc->offset],
>> +					req->tc->pubuf, len);
>> +				if (err) {
>> +					err = -EFAULT;
>> +					return err;
>> +				}
> 
> Will this handle error resulting from kernel_read ?. I guess we have a
> kernel address there.

Yes. It should work. copy_to/from_user works fine..if you recall this how the
code is in the client_read().

- JV

> 
> 
>> +			} else {
>> +				/* Kernel Buffer */
>> +				memmove(&req->rc->sdata[req->rc->offset],
>> +						req->tc->pkbuf, len);
>> +			}
>> +		}
> 
> -aneesh



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC-V3 6/7] [net/9p] Read and Write side zerocopy changes for 9P2000.L protocol.
  2011-02-11  1:25 ` [RFC-V3 6/7] [net/9p] Read and Write side zerocopy changes for 9P2000.L protocol Venkateswararao Jujjuri (JV)
@ 2011-02-11 19:35   ` Aneesh Kumar K. V
  2011-02-11 21:03     ` Venkateswararao Jujjuri (JV)
  0 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K. V @ 2011-02-11 19:35 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV), v9fs-developer
  Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)

On Thu, 10 Feb 2011 17:25:10 -0800, "Venkateswararao Jujjuri (JV)" <jvrao@linux.vnet.ibm.com> wrote:
> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
> ---
>  net/9p/client.c   |   47 +++++++++++++++++++++++++++++++++--------------
>  net/9p/protocol.c |   21 +++++++++++++++++++++
>  2 files changed, 54 insertions(+), 14 deletions(-)
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index a848bca..f6d8531 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -1270,7 +1270,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset,
>  	if (count < rsize)
>  		rsize = count;
> 
> -	req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize);
> +	/* Don't bother zerocopy form small IO (< 1024) */
> +	if (((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) ==
> +			P9_TRANS_PREF_PAYLOAD_SEP) && (rsize > 1024)) {
> +		req = p9_client_rpc(clnt, P9_TREAD, "dqE", fid->fid, offset,
> +				rsize, data, udata);
> +	} else {
> +		req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset,
> +				rsize);
> +	}
>  	if (IS_ERR(req)) {
>  		err = PTR_ERR(req);
>  		goto error;
> @@ -1284,13 +1292,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset,
> 
>  	P9_DPRINTK(P9_DEBUG_9P, "<<< RREAD count %d\n", count);
> 
> -	if (data) {
> -		memmove(data, dataptr, count);
> -	} else {
> -		err = copy_to_user(udata, dataptr, count);
> -		if (err) {
> -			err = -EFAULT;
> -			goto free_and_error;
> +	if (!req->tc->pbuf_size) {
> +		if (data) {
> +			memmove(data, dataptr, count);
> +		} else {
> +			err = copy_to_user(udata, dataptr, count);
> +			if (err) {
> +				err = -EFAULT;
> +				goto free_and_error;
> +			}
>  		}
>  	}
>  	p9_free_req(clnt, req);
> @@ -1323,12 +1333,21 @@ p9_client_write(struct p9_fid *fid, char *data, const char __user *udata,
> 
>  	if (count < rsize)
>  		rsize = count;
> -	if (data)
> -		req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, offset,
> -								rsize, data);
> -	else
> -		req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, offset,
> -								rsize, udata);
> +
> +	/* Don't bother zerocopy form small IO (< 1024) */
> +	if (((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) ==
> +			P9_TRANS_PREF_PAYLOAD_SEP) && (rsize > 1024)) {
> +		req = p9_client_rpc(clnt, P9_TWRITE, "dqE", fid->fid, offset,
> +				rsize, data, udata);
> +	} else {
> +		if (data)
> +			req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid,
> +					offset, rsize, data);
> +		else
> +			req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid,
> +					offset, rsize, udata);
> +	}
> +
>  	if (IS_ERR(req)) {
>  		err = PTR_ERR(req);
>  		goto error;
> diff --git a/net/9p/protocol.c b/net/9p/protocol.c
> index 5936c50..830b999 100644
> --- a/net/9p/protocol.c
> +++ b/net/9p/protocol.c
> @@ -114,6 +114,17 @@ pdu_write_u(struct p9_fcall *pdu, const char __user *udata, size_t size)
>  	return size - len;
>  }
> 
> +static size_t
> +pdu_write_urw(struct p9_fcall *pdu, const char *kdata, const char __user *udata,
> +		size_t size)
> +{
> +	size_t len = min(pdu->capacity - pdu->size, size);

Why do we need to do this ? We are not placing anything in the pdu right ?

> +	pdu->pubuf = (char __user *)udata;
> +	pdu->pkbuf = (char *)kdata;
> +	pdu->pbuf_size = len;
> +	return size - len;

Does this mean a zero copy write of a buffer larger than msize will
result in a failure ?

> +}
> +
>  /*
>  	b - int8_t
>  	w - int16_t
> @@ -445,6 +456,16 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
>  					errcode = -EFAULT;
>  			}
>  			break;
> +		case 'E':{
> +				 int32_t cnt = va_arg(ap, int32_t);
> +				 const char *k = va_arg(ap, const void *);
> +				 const char *u = va_arg(ap, const void *);
> +				 errcode = p9pdu_writef(pdu, proto_version, "d",
> +						 cnt);
> +				 if (!errcode && pdu_write_urw(pdu, k, u, cnt))
> +					errcode = -EFAULT;
> +			 }
> +			 break;
>  		case 'U':{
>  				int32_t count = va_arg(ap, int32_t);
>  				const char __user *udata =


-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC-V3 6/7] [net/9p] Read and Write side zerocopy changes for 9P2000.L protocol.
  2011-02-11 19:35   ` Aneesh Kumar K. V
@ 2011-02-11 21:03     ` Venkateswararao Jujjuri (JV)
  0 siblings, 0 replies; 17+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-11 21:03 UTC (permalink / raw)
  To: Aneesh Kumar K. V; +Cc: v9fs-developer, linux-fsdevel

On 2/11/2011 11:35 AM, Aneesh Kumar K. V wrote:
> On Thu, 10 Feb 2011 17:25:10 -0800, "Venkateswararao Jujjuri (JV)" <jvrao@linux.vnet.ibm.com> wrote:
>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>> ---
>>  net/9p/client.c   |   47 +++++++++++++++++++++++++++++++++--------------
>>  net/9p/protocol.c |   21 +++++++++++++++++++++
>>  2 files changed, 54 insertions(+), 14 deletions(-)
>>
>> diff --git a/net/9p/client.c b/net/9p/client.c
>> index a848bca..f6d8531 100644
>> --- a/net/9p/client.c
>> +++ b/net/9p/client.c
>> @@ -1270,7 +1270,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset,
>>  	if (count < rsize)
>>  		rsize = count;
>>
>> -	req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize);
>> +	/* Don't bother zerocopy form small IO (< 1024) */
>> +	if (((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) ==
>> +			P9_TRANS_PREF_PAYLOAD_SEP) && (rsize > 1024)) {
>> +		req = p9_client_rpc(clnt, P9_TREAD, "dqE", fid->fid, offset,
>> +				rsize, data, udata);
>> +	} else {
>> +		req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset,
>> +				rsize);
>> +	}
>>  	if (IS_ERR(req)) {
>>  		err = PTR_ERR(req);
>>  		goto error;
>> @@ -1284,13 +1292,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset,
>>
>>  	P9_DPRINTK(P9_DEBUG_9P, "<<< RREAD count %d\n", count);
>>
>> -	if (data) {
>> -		memmove(data, dataptr, count);
>> -	} else {
>> -		err = copy_to_user(udata, dataptr, count);
>> -		if (err) {
>> -			err = -EFAULT;
>> -			goto free_and_error;
>> +	if (!req->tc->pbuf_size) {
>> +		if (data) {
>> +			memmove(data, dataptr, count);
>> +		} else {
>> +			err = copy_to_user(udata, dataptr, count);
>> +			if (err) {
>> +				err = -EFAULT;
>> +				goto free_and_error;
>> +			}
>>  		}
>>  	}
>>  	p9_free_req(clnt, req);
>> @@ -1323,12 +1333,21 @@ p9_client_write(struct p9_fid *fid, char *data, const char __user *udata,
>>
>>  	if (count < rsize)
>>  		rsize = count;
>> -	if (data)
>> -		req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, offset,
>> -								rsize, data);
>> -	else
>> -		req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, offset,
>> -								rsize, udata);
>> +
>> +	/* Don't bother zerocopy form small IO (< 1024) */
>> +	if (((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) ==
>> +			P9_TRANS_PREF_PAYLOAD_SEP) && (rsize > 1024)) {
>> +		req = p9_client_rpc(clnt, P9_TWRITE, "dqE", fid->fid, offset,
>> +				rsize, data, udata);
>> +	} else {
>> +		if (data)
>> +			req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid,
>> +					offset, rsize, data);
>> +		else
>> +			req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid,
>> +					offset, rsize, udata);
>> +	}
>> +
>>  	if (IS_ERR(req)) {
>>  		err = PTR_ERR(req);
>>  		goto error;
>> diff --git a/net/9p/protocol.c b/net/9p/protocol.c
>> index 5936c50..830b999 100644
>> --- a/net/9p/protocol.c
>> +++ b/net/9p/protocol.c
>> @@ -114,6 +114,17 @@ pdu_write_u(struct p9_fcall *pdu, const char __user *udata, size_t size)
>>  	return size - len;
>>  }
>>
>> +static size_t
>> +pdu_write_urw(struct p9_fcall *pdu, const char *kdata, const char __user *udata,
>> +		size_t size)
>> +{
>> +	size_t len = min(pdu->capacity - pdu->size, size);
> 
> Why do we need to do this ? We are not placing anything in the pdu right ?

But still as per protocol each packet should be <= msize.
I will have another patch series introducing another size bufsize in addition to
msize.
Where bufsize will represent small PDUs (which is <= msize).

> 
>> +	pdu->pubuf = (char __user *)udata;
>> +	pdu->pkbuf = (char *)kdata;
>> +	pdu->pbuf_size = len;
>> +	return size - len;
> 
> Does this mean a zero copy write of a buffer larger than msize will
> result in a failure ?

No; No failures.
If your recall, all our read/write routines doesn't guarantee writing entire
request in one shot..At multiple levels we adjust the read/write size
and the top level takes care of it by dividing the buffer into multiple
reads/writes.
This is not anything new in zero copy .. same thing happens even in the current
code. (pdu_write_u)

- JV
> 
>> +}
>> +
>>  /*
>>  	b - int8_t
>>  	w - int16_t
>> @@ -445,6 +456,16 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
>>  					errcode = -EFAULT;
>>  			}
>>  			break;
>> +		case 'E':{
>> +				 int32_t cnt = va_arg(ap, int32_t);
>> +				 const char *k = va_arg(ap, const void *);
>> +				 const char *u = va_arg(ap, const void *);
>> +				 errcode = p9pdu_writef(pdu, proto_version, "d",
>> +						 cnt);
>> +				 if (!errcode && pdu_write_urw(pdu, k, u, cnt))
>> +					errcode = -EFAULT;
>> +			 }
>> +			 break;
>>  		case 'U':{
>>  				int32_t count = va_arg(ap, int32_t);
>>  				const char __user *udata =
> 
> 
> -aneesh



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-02-11 21:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-11  1:25 [RFC-V3] [net/9p] [PATCH 0/7] Zero Copy Venkateswararao Jujjuri (JV)
2011-02-11  1:25 ` [RFC-V3 1/7] [net/9p] Additional elements to p9_fcall to acoomodate zero copy Venkateswararao Jujjuri (JV)
2011-02-11  1:25 ` [RFC-V3 2/7] [net/9p] Adds supporting functions for " Venkateswararao Jujjuri (JV)
2011-02-11  1:25 ` [RFC-V3 3/7] [net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed Venkateswararao Jujjuri (JV)
2011-02-11  6:59   ` Aneesh Kumar K. V
2011-02-11  1:25 ` [RFC-V3 4/7] [net/9p] Add gup/zero_copy support to VirtIO transport layer Venkateswararao Jujjuri (JV)
2011-02-11  7:07   ` Aneesh Kumar K. V
2011-02-11 16:08     ` Venkateswararao Jujjuri (JV)
2011-02-11 17:58   ` Aneesh Kumar K. V
2011-02-11 18:42     ` Venkateswararao Jujjuri (JV)
2011-02-11  1:25 ` [RFC-V3 5/7] [net/9p] Add preferences to " Venkateswararao Jujjuri (JV)
2011-02-11  1:25 ` [RFC-V3 6/7] [net/9p] Read and Write side zerocopy changes for 9P2000.L protocol Venkateswararao Jujjuri (JV)
2011-02-11 19:35   ` Aneesh Kumar K. V
2011-02-11 21:03     ` Venkateswararao Jujjuri (JV)
2011-02-11  1:25 ` [RFC-V3 7/7] [net/9p] Handle TREAD/RERROR case in !dotl case Venkateswararao Jujjuri (JV)
2011-02-11 18:10   ` Aneesh Kumar K. V
2011-02-11 18:46     ` Venkateswararao Jujjuri (JV)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).