* [V2-00/08] [net/9p] ZeroCopy patch series
@ 2011-02-17 21:33 Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 1/8] [net/9p] Preparation and helper functions for zero copy Venkateswararao Jujjuri (JV)
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-17 21:33 UTC (permalink / raw)
To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri
Changes from V1:
o Added READDIR support for zero copy.
o Fixed an issue with placing kernel buffer on VirtIO ring.
o Replaced size calicluation in marshalling routines with an BUG_ON.
Text from old series:
This patch series introduces zero copy capability to the 9p transport layer.
9P Linux client makes an additional copy of read/write buffer into the kernel
before sending it down to the transport layer. There is no functional
need for this additional copy hence it is eliminated by sending the payload
buffer directly to the transport layer. While this is advantageous to all
transports, it can be further exploited by virtualized transport layers like
VirtIO, by directly send user buffer to the server and there by achieving
real zero copy.
Design Goals.
- Have minimal changes to the net layer so that common code is not polluted by
the transport specifics.
- Create a common transport library which can be used by other transports.
- Avoid additional optimizations in the initial attempt (more details below)
and focus on achieving basic functionality.
Design
This patch added infrastructure to send the payload buffers directly to the
transport layer if the later prefers. To accomplish this preferences property
is added to the transport layer and additional elements are added to the
PDU structure (struct 9p_fcall)
Transport layer specifies the preference through newly introduced field in the
transport module. (clnt->trans_mod->pref) and net layer sends the the payload
through pubuf/pkbuf elements of struct 9p_fcall.
This method has few advantages.
- Keeps the net layer clean and lets the transport layer deal with specifics.
- mapping user addr into kernel pages pins the memory this could make the
system vulnerable to denial-of-service attacks. This change gives
transport layer more control to implement effective flow control.
Expect flow control patches shortly.
- If a transport layer doesn't see the need to handle payload separately,
it can set the preference accordingly so that current code works with no
changes. This is very useful for transports which has no plans of
converting/pinning user pages.
There is rather a sticky issue with is a rather sticky issue
with TREAD/RERROR scenario in non-9P2000.L protocols (Legacy, 9P2000.u)
If the server has to fail the READ request, it can send an error
up to ERRMAX(256). As this is not fixed size, it is hard to allocate
fixed amount of buffer from the transport layer perspective.
In 9P2000.L, the error is a fixed size (errno) hence not an issue.
On success the received packet will be PDU header + read size + payload.
On error it is PDU header + errno. Hence non-payload size is constant (11)
irrespective of success or failure.
But this is not the case in non-9P2000.L the header size is different in the
failure (TREAD/RERROR) case. To take care of this the patch makes sure that
the read buffer is big enough to accommodate ERRMAX string.
It also means that there is a chance of scribbling on the payload/user
buffer in the error case for those non-POSIX complaint protocols.
The added trans_mod->pref will give the option of not participating in the
zero copy.
This series also created trans_common.[ch] to house common functions so
that other transport layers can take advantage of them.
Testing/Performance:
Setup: HS21 blade a two socket quad core Xeon with 4 GB memory, IO to the
local disk.
WRITE
dd if=/dev/zero of=/pmnt/file1 bs=4096 count=1MB (variable bs = IO SIZE)
IO SIZE TOTAL SIZE No ZC ZC
1 1MB 22.4 kb/s 19.8 kb/s
32 32MB 711 kb/s 633 kb/s
64 64MB 1.4 mb/s 1.3 mb/s
128 128MB 2.8 mb/s 2.6 mb/s
256 256MB 5.6 mb/s 5.1 mb/s
512 512MB 10.4 mb/s 10.2 mb/s
1024 1GB 19.7 mb/s 20.4 mb/s
2048 2GB 40.1 mb/s 43.7 mb/s
4096 4GB 71.4 mb/s 73.1 mb/s
READ
dd of=/dev/null if=/pmnt/file1 bs=4096 count=1MB(variable bs = IO SIZE)
IO SIZE TOTAL SIZE No ZC ZC
1 1MB 26.6 kb/s 23.1 kb/s
32 32MB 783 kb/s 734 kb/s
64 64MB 1.7 mb/s 1.5 mb/s
128 128MB 3.4 mb/s 3.0 mb/s
256 256MB 4.2 mb/s 5.9 mb/s
512 512MB 6.9 mb/s 11.6 mb/s
1024 1GB 23.3 mb/s 23.4 mb/s
2048 2GB 42.5 mb/s 45.4 mb/s
4096 4GB 67.4 mb/s 73.9 mb/s
ZC benefits are seen beyond 1k buffer. Hence the patch makes sure that
the zero copy is not enforced for smaller IO (< 1024)
My setup/box could be a bottleneck as it gave similar numbers even on the host.
But observed better numbers with zero copy on bigger setup.
What is following this patch series (Future work)
1.
One of the major advantage of this patch series is to have bigger msize
to pull off bigger read/writes from the server. Increasing the msize is not
really a solution as majority of other transactions are extremely small which
could result in waste of kernel heap. To address this problem we need to have
two sizes of PDUs.
2.
Add flow-control capability to the transport layer.
3.
Add a mount option to disable the zero copy even if the user prefers to.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [V2 1/8] [net/9p] Preparation and helper functions for zero copy
2011-02-17 21:33 [V2-00/08] [net/9p] ZeroCopy patch series Venkateswararao Jujjuri (JV)
@ 2011-02-17 21:33 ` Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 2/8] [net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed Venkateswararao Jujjuri (JV)
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-17 21:33 UTC (permalink / raw)
To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)
This patch prepares p9_fcall structure for zero copy. Added
fields send the payload buffer information to the transport layer.
In addition it adds a 'private' field for the transport layer to
store mapped/pinned page information so that it can be freed/unpinned
during req_done.
This patch also creates trans_common.[ch] to house helper functions.
It adds the following helper functions.
p9_release_req_pages - Release pages after the transaction.
p9_nr_pages - Return number of pages needed to accomodate the payload.
payload_gup - Translates user buffer into kernel pages.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
include/net/9p/9p.h | 8 ++++
net/9p/Makefile | 1 +
net/9p/protocol.c | 4 ++
net/9p/trans_common.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++
net/9p/trans_common.h | 29 +++++++++++++++
5 files changed, 135 insertions(+), 0 deletions(-)
create mode 100644 net/9p/trans_common.c
create mode 100644 net/9p/trans_common.h
diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
index 071fd7a..7aefa6d 100644
--- a/include/net/9p/9p.h
+++ b/include/net/9p/9p.h
@@ -689,6 +689,10 @@ struct p9_rwstat {
* @tag: transaction id of the request
* @offset: used by marshalling routines to track currentposition in buffer
* @capacity: used by marshalling routines to track total capacity
+ * @pubuf: Payload user buffer given by the caller
+ * @pubuf: Payload kernel buffer given by the caller
+ * @pbuf_size: pubuf/pkbuf(only one will be !NULL) size to be read/write.
+ * @private: For transport layer's use.
* @sdata: payload
*
* &p9_fcall represents the structure for all 9P RPC
@@ -705,6 +709,10 @@ struct p9_fcall {
size_t offset;
size_t capacity;
+ char __user *pubuf;
+ char *pkbuf;
+ size_t pbuf_size;
+ void *private;
uint8_t *sdata;
};
diff --git a/net/9p/Makefile b/net/9p/Makefile
index 198a640..a0874cc 100644
--- a/net/9p/Makefile
+++ b/net/9p/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_NET_9P_RDMA) += 9pnet_rdma.o
util.o \
protocol.o \
trans_fd.o \
+ trans_common.o \
9pnet_virtio-objs := \
trans_virtio.o \
diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index 1e308f2..d888847 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -606,6 +606,10 @@ void p9pdu_reset(struct p9_fcall *pdu)
{
pdu->offset = 0;
pdu->size = 0;
+ pdu->private = NULL;
+ pdu->pubuf = NULL;
+ pdu->pkbuf = NULL;
+ pdu->pbuf_size = 0;
}
int p9dirent_read(char *buf, int len, struct p9_dirent *dirent,
diff --git a/net/9p/trans_common.c b/net/9p/trans_common.c
new file mode 100644
index 0000000..ca705f1
--- /dev/null
+++ b/net/9p/trans_common.c
@@ -0,0 +1,93 @@
+/*
+ * Copyright IBM Corporation, 2010
+ * Author Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#include <linux/slab.h>
+#include <net/9p/9p.h>
+#include <net/9p/client.h>
+#include <linux/scatterlist.h>
+#include "trans_common.h"
+
+/**
+ * p9_release_req_pages - Release pages after the transaction.
+ * @*private: PDU's private page of struct trans_rpage_info
+ */
+void
+p9_release_req_pages(struct trans_rpage_info *rpinfo)
+{
+ int i = 0;
+
+ while (rpinfo->rp_data[i] && rpinfo->rp_nr_pages--) {
+ put_page(rpinfo->rp_data[i]);
+ i++;
+ }
+}
+
+/**
+ * p9_nr_pages - Return number of pages needed to accomodate the payload.
+ */
+int
+p9_nr_pages(struct p9_req_t *req)
+{
+ int start_page, end_page;
+ start_page = (unsigned long long)req->tc->pubuf >> PAGE_SHIFT;
+ end_page = ((unsigned long long)req->tc->pubuf + req->tc->pbuf_size +
+ PAGE_SIZE - 1) >> PAGE_SHIFT;
+ return end_page - start_page;
+}
+
+/**
+ * payload_gup - Translates user buffer into kernel pages and
+ * pins them either for read/write through get_user_pages_fast().
+ * @req: Request to be sent to server.
+ * @pdata_off: data offset into the first page after translation (gup).
+ * @pdata_len: Total length of the IO. gup may not return requested # of pages.
+ * @nr_pages: number of pages to accomodate the payload
+ * @rw: Indicates if the pages are for read or write.
+ */
+int
+p9_payload_gup(struct p9_req_t *req, size_t *pdata_off, int *pdata_len,
+ int nr_pages, u8 rw)
+{
+ uint32_t first_page_bytes = 0;
+ uint32_t pdata_mapped_pages;
+ struct trans_rpage_info *rpinfo;
+
+ *pdata_off = (size_t)req->tc->pubuf & (PAGE_SIZE-1);
+
+ if (*pdata_off)
+ first_page_bytes = min((PAGE_SIZE - *pdata_off),
+ req->tc->pbuf_size);
+
+ rpinfo = req->tc->private;
+ pdata_mapped_pages = get_user_pages_fast((unsigned long)req->tc->pubuf,
+ nr_pages, rw, &rpinfo->rp_data[0]);
+
+ if (pdata_mapped_pages < 0) {
+ printk(KERN_ERR "get_user_pages_fast failed:%d udata:%p"
+ "nr_pages:%d\n", pdata_mapped_pages,
+ req->tc->pubuf, nr_pages);
+ pdata_mapped_pages = 0;
+ return -EIO;
+ }
+ rpinfo->rp_nr_pages = pdata_mapped_pages;
+ if (*pdata_off) {
+ *pdata_len = first_page_bytes;
+ *pdata_len += min((req->tc->pbuf_size - *pdata_len),
+ ((size_t)pdata_mapped_pages - 1) << PAGE_SHIFT);
+ } else {
+ *pdata_len = min(req->tc->pbuf_size,
+ (size_t)pdata_mapped_pages << PAGE_SHIFT);
+ }
+ return 0;
+}
diff --git a/net/9p/trans_common.h b/net/9p/trans_common.h
new file mode 100644
index 0000000..04977e0
--- /dev/null
+++ b/net/9p/trans_common.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright IBM Corporation, 2010
+ * Author Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+/**
+ * struct trans_rpage_info - To store mapped page information in PDU.
+ * @rp_alloc:Set if this structure is allocd, not a reuse unused space in pdu.
+ * @rp_nr_pages: Number of mapped pages
+ * @rp_data: Array of page pointers
+ */
+struct trans_rpage_info {
+ u8 rp_alloc;
+ int rp_nr_pages;
+ struct page *rp_data[0];
+};
+
+void p9_release_req_pages(struct trans_rpage_info *);
+int p9_payload_gup(struct p9_req_t *, size_t *, int *, int, u8);
+int p9_nr_pages(struct p9_req_t *);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [V2 2/8] [net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed.
2011-02-17 21:33 [V2-00/08] [net/9p] ZeroCopy patch series Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 1/8] [net/9p] Preparation and helper functions for zero copy Venkateswararao Jujjuri (JV)
@ 2011-02-17 21:33 ` Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 3/8] [net/9p] Add gup/zero_copy support to VirtIO transport layer Venkateswararao Jujjuri (JV)
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-17 21:33 UTC (permalink / raw)
To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)
This will be used by the transport layer to determine the out going
request type. Transport layer uses this information to correctly
place the mapped pages in the PDU. Patches following this will make
use of this to achieve zero copy.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
net/9p/protocol.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index d888847..5936c50 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -579,6 +579,7 @@ EXPORT_SYMBOL(p9stat_read);
int p9pdu_prepare(struct p9_fcall *pdu, int16_t tag, int8_t type)
{
+ pdu->id = type;
return p9pdu_writef(pdu, 0, "dbw", 0, type, tag);
}
--
1.6.5.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [V2 3/8] [net/9p] Add gup/zero_copy support to VirtIO transport layer.
2011-02-17 21:33 [V2-00/08] [net/9p] ZeroCopy patch series Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 1/8] [net/9p] Preparation and helper functions for zero copy Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 2/8] [net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed Venkateswararao Jujjuri (JV)
@ 2011-02-17 21:33 ` Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 4/8] [net/9p] Add preferences to " Venkateswararao Jujjuri (JV)
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-17 21:33 UTC (permalink / raw)
To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)
Modify p9_virtio_request() and req_done() functions to support
additional payload sent down to the transport layer through
tc->pubuf and tc->pkbuf.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
net/9p/trans_common.h | 3 +
net/9p/trans_virtio.c | 128 +++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 126 insertions(+), 5 deletions(-)
diff --git a/net/9p/trans_common.h b/net/9p/trans_common.h
index 04977e0..7630922 100644
--- a/net/9p/trans_common.h
+++ b/net/9p/trans_common.h
@@ -12,6 +12,9 @@
*
*/
+/* TRUE if it is user context */
+#define P9_IS_USER_CONTEXT (!segment_eq(get_fs(), KERNEL_DS))
+
/**
* struct trans_rpage_info - To store mapped page information in PDU.
* @rp_alloc:Set if this structure is allocd, not a reuse unused space in pdu.
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index c8f3f72..4b236de 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -45,6 +45,7 @@
#include <linux/scatterlist.h>
#include <linux/virtio.h>
#include <linux/virtio_9p.h>
+#include "trans_common.h"
#define VIRTQUEUE_NUM 128
@@ -155,6 +156,14 @@ static void req_done(struct virtqueue *vq)
rc->tag);
req = p9_tag_lookup(chan->client, rc->tag);
req->status = REQ_STATUS_RCVD;
+ if (req->tc->private) {
+ struct trans_rpage_info *rp = req->tc->private;
+ /*Release pages */
+ p9_release_req_pages(rp);
+ if (rp->rp_alloc)
+ kfree(rp);
+ req->tc->private = NULL;
+ }
p9_client_cb(chan->client, req);
} else {
spin_unlock_irqrestore(&chan->lock, flags);
@@ -203,6 +212,38 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
}
/**
+ * pack_sg_list_p - Just like pack_sg_list. Instead of taking a buffer,
+ * this takes a list of pages.
+ * @sg: scatter/gather list to pack into
+ * @start: which segment of the sg_list to start at
+ * @pdata_off: Offset into the first page
+ * @**pdata: a list of pages to add into sg.
+ * @count: amount of data to pack into the scatter/gather list
+ */
+static int
+pack_sg_list_p(struct scatterlist *sg, int start, int limit, size_t pdata_off,
+ struct page **pdata, int count)
+{
+ int s;
+ int i = 0;
+ int index = start;
+
+ if (pdata_off) {
+ s = min((int)(PAGE_SIZE - pdata_off), count);
+ sg_set_page(&sg[index++], pdata[i++], s, pdata_off);
+ count -= s;
+ }
+
+ while (count) {
+ BUG_ON(index > limit);
+ s = min((int)PAGE_SIZE, count);
+ sg_set_page(&sg[index++], pdata[i++], s, 0);
+ count -= s;
+ }
+ return index-start;
+}
+
+/**
* p9_virtio_request - issue a request
* @client: client instance issuing the request
* @req: request to be issued
@@ -212,22 +253,97 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
static int
p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
{
- int in, out;
+ int in, out, inp, outp;
struct virtio_chan *chan = client->trans;
char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
unsigned long flags;
- int err;
+ size_t pdata_off = 0;
+ struct trans_rpage_info *rpinfo = NULL;
+ int err, pdata_len = 0;
P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
req_retry:
req->status = REQ_STATUS_SENT;
+ if (req->tc->pbuf_size && (req->tc->pubuf && P9_IS_USER_CONTEXT)) {
+ int nr_pages = p9_nr_pages(req);
+ int rpinfo_size = sizeof(struct trans_rpage_info) +
+ sizeof(struct page *) * nr_pages;
+
+ if (rpinfo_size <= (req->tc->capacity - req->tc->size)) {
+ /* We can use sdata */
+ req->tc->private = req->tc->sdata + req->tc->size;
+ rpinfo = (struct trans_rpage_info *)req->tc->private;
+ rpinfo->rp_alloc = 0;
+ } else {
+ req->tc->private = kmalloc(rpinfo_size, GFP_NOFS);
+ if (!req->tc->private) {
+ P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: "
+ "private kmalloc returned NULL");
+ return -ENOMEM;
+ }
+ rpinfo = (struct trans_rpage_info *)req->tc->private;
+ rpinfo->rp_alloc = 1;
+ }
+
+ err = p9_payload_gup(req, &pdata_off, &pdata_len, nr_pages,
+ req->tc->id == P9_TREAD ? 1 : 0);
+ if (err < 0) {
+ if (rpinfo->rp_alloc)
+ kfree(rpinfo);
+ return err;
+ }
+ }
+
spin_lock_irqsave(&chan->lock, flags);
+
+ /* Handle out VirtIO ring buffers */
out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
- req->tc->size);
- in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
- client->msize);
+ req->tc->size);
+
+ if (req->tc->pbuf_size && (req->tc->id == P9_TWRITE)) {
+ /* We have additional write payload buffer to take care */
+ if (req->tc->pubuf && P9_IS_USER_CONTEXT) {
+ outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
+ pdata_off, rpinfo->rp_data, pdata_len);
+ } else {
+ char *pbuf = req->tc->pubuf ? req->tc->pubuf :
+ req->tc->pkbuf;
+ outp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, pbuf,
+ req->tc->pbuf_size);
+ }
+ out += outp;
+ }
+
+ /* Handle in VirtIO ring buffers */
+ if (req->tc->pbuf_size &&
+ ((req->tc->id == P9_TREAD) || (req->tc->id == P9_TREADDIR))) {
+ /*
+ * Take care of additional Read payload.
+ * 11 is the read/write header = PDU Header(7) + IO Size (4).
+ * Arrange in such a way that server places header in the
+ * alloced memory and payload onto the user buffer.
+ */
+ inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
+ /*
+ * Running executables in the filesystem may result in
+ * a read request with kernel buffer as opposed to user buffer.
+ */
+ if (req->tc->pubuf && P9_IS_USER_CONTEXT) {
+ in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
+ pdata_off, rpinfo->rp_data, pdata_len);
+ } else {
+ char *pbuf = req->tc->pubuf ? req->tc->pubuf :
+ req->tc->pkbuf;
+ in = pack_sg_list(chan->sg, out+inp, VIRTQUEUE_NUM,
+ pbuf, req->tc->pbuf_size);
+ }
+ in += inp;
+ } else {
+ in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
+ client->msize);
+ }
err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc);
if (err < 0) {
@@ -246,6 +362,8 @@ req_retry:
P9_DPRINTK(P9_DEBUG_TRANS,
"9p debug: "
"virtio rpc add_buf returned failure");
+ if (rpinfo && rpinfo->rp_alloc)
+ kfree(rpinfo);
return -EIO;
}
}
--
1.6.5.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [V2 4/8] [net/9p] Add preferences to transport layer.
2011-02-17 21:33 [V2-00/08] [net/9p] ZeroCopy patch series Venkateswararao Jujjuri (JV)
` (2 preceding siblings ...)
2011-02-17 21:33 ` [V2 3/8] [net/9p] Add gup/zero_copy support to VirtIO transport layer Venkateswararao Jujjuri (JV)
@ 2011-02-17 21:33 ` Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 5/8] [net/9p] Read side zerocopy changes for 9P2000.L protocol Venkateswararao Jujjuri (JV)
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-17 21:33 UTC (permalink / raw)
To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)
This patch adds preferences field to the p9_trans_module.
Through this, now transport layer can express its preference about the
payload. i.e if payload neds to be part of the PDU or it prefers it
to be sent sepearetly so that the transport layer can handle it in
a better way.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
include/net/9p/transport.h | 9 +++++++++
net/9p/trans_virtio.c | 1 +
2 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/include/net/9p/transport.h b/include/net/9p/transport.h
index 6d5886e..82868f1 100644
--- a/include/net/9p/transport.h
+++ b/include/net/9p/transport.h
@@ -26,11 +26,19 @@
#ifndef NET_9P_TRANSPORT_H
#define NET_9P_TRANSPORT_H
+#define P9_TRANS_PREF_PAYLOAD_MASK 0x1
+
+/* Default. Add Payload to PDU before sending it down to transport layer */
+#define P9_TRANS_PREF_PAYLOAD_DEF 0x0
+/* Send pay load seperately to transport layer along with PDU.*/
+#define P9_TRANS_PREF_PAYLOAD_SEP 0x1
+
/**
* struct p9_trans_module - transport module interface
* @list: used to maintain a list of currently available transports
* @name: the human-readable name of the transport
* @maxsize: transport provided maximum packet size
+ * @pref: Preferences of this transport
* @def: set if this transport should be considered the default
* @create: member function to create a new connection on this transport
* @request: member function to issue a request to the transport
@@ -47,6 +55,7 @@ struct p9_trans_module {
struct list_head list;
char *name; /* name of transport */
int maxsize; /* max message size of transport */
+ int pref; /* Preferences of this transport */
int def; /* this transport should be default */
struct module *owner;
int (*create)(struct p9_client *, const char *, char *);
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 4b236de..9b550ed 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -566,6 +566,7 @@ static struct p9_trans_module p9_virtio_trans = {
.request = p9_virtio_request,
.cancel = p9_virtio_cancel,
.maxsize = PAGE_SIZE*16,
+ .pref = P9_TRANS_PREF_PAYLOAD_SEP,
.def = 0,
.owner = THIS_MODULE,
};
--
1.6.5.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [V2 5/8] [net/9p] Read side zerocopy changes for 9P2000.L protocol.
2011-02-17 21:33 [V2-00/08] [net/9p] ZeroCopy patch series Venkateswararao Jujjuri (JV)
` (3 preceding siblings ...)
2011-02-17 21:33 ` [V2 4/8] [net/9p] Add preferences to " Venkateswararao Jujjuri (JV)
@ 2011-02-17 21:33 ` Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 6/8] [net/9p] Write " Venkateswararao Jujjuri (JV)
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-17 21:33 UTC (permalink / raw)
To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)
Modify p9_client_read() to check the transport preference and act accordingly.
If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload
separately instead of putting it directly on PDU.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
net/9p/client.c | 26 ++++++++++++++++++--------
net/9p/protocol.c | 21 +++++++++++++++++++++
2 files changed, 39 insertions(+), 8 deletions(-)
diff --git a/net/9p/client.c b/net/9p/client.c
index a848bca..82079f9 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1270,7 +1270,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset,
if (count < rsize)
rsize = count;
- req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize);
+ /* Don't bother zerocopy form small IO (< 1024) */
+ if (((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) ==
+ P9_TRANS_PREF_PAYLOAD_SEP) && (rsize > 1024)) {
+ req = p9_client_rpc(clnt, P9_TREAD, "dqE", fid->fid, offset,
+ rsize, data, udata);
+ } else {
+ req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset,
+ rsize);
+ }
if (IS_ERR(req)) {
err = PTR_ERR(req);
goto error;
@@ -1284,13 +1292,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset,
P9_DPRINTK(P9_DEBUG_9P, "<<< RREAD count %d\n", count);
- if (data) {
- memmove(data, dataptr, count);
- } else {
- err = copy_to_user(udata, dataptr, count);
- if (err) {
- err = -EFAULT;
- goto free_and_error;
+ if (!req->tc->pbuf_size) {
+ if (data) {
+ memmove(data, dataptr, count);
+ } else {
+ err = copy_to_user(udata, dataptr, count);
+ if (err) {
+ err = -EFAULT;
+ goto free_and_error;
+ }
}
}
p9_free_req(clnt, req);
diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index 5936c50..7bca242 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -114,6 +114,17 @@ pdu_write_u(struct p9_fcall *pdu, const char __user *udata, size_t size)
return size - len;
}
+static size_t
+pdu_write_urw(struct p9_fcall *pdu, const char *kdata, const char __user *udata,
+ size_t size)
+{
+ BUG_ON(pdu->size > P9_IOHDRSZ);
+ pdu->pubuf = (char __user *)udata;
+ pdu->pkbuf = (char *)kdata;
+ pdu->pbuf_size = size;
+ return 0;
+}
+
/*
b - int8_t
w - int16_t
@@ -445,6 +456,16 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
errcode = -EFAULT;
}
break;
+ case 'E':{
+ int32_t cnt = va_arg(ap, int32_t);
+ const char *k = va_arg(ap, const void *);
+ const char *u = va_arg(ap, const void *);
+ errcode = p9pdu_writef(pdu, proto_version, "d",
+ cnt);
+ if (!errcode && pdu_write_urw(pdu, k, u, cnt))
+ errcode = -EFAULT;
+ }
+ break;
case 'U':{
int32_t count = va_arg(ap, int32_t);
const char __user *udata =
--
1.6.5.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [V2 6/8] [net/9p] Write side zerocopy changes for 9P2000.L protocol.
2011-02-17 21:33 [V2-00/08] [net/9p] ZeroCopy patch series Venkateswararao Jujjuri (JV)
` (4 preceding siblings ...)
2011-02-17 21:33 ` [V2 5/8] [net/9p] Read side zerocopy changes for 9P2000.L protocol Venkateswararao Jujjuri (JV)
@ 2011-02-17 21:33 ` Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 7/8] [net/9p] readdir " Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 8/8] [net/9p] Handle TREAD/RERROR case in !dotl case Venkateswararao Jujjuri (JV)
7 siblings, 0 replies; 9+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-17 21:33 UTC (permalink / raw)
To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)
Modify p9_client_write() to check the transport preference and act accordingly.
If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload
separately instead of putting it directly on PDU.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
net/9p/client.c | 21 +++++++++++++++------
1 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/net/9p/client.c b/net/9p/client.c
index 82079f9..412c52e 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1333,12 +1333,21 @@ p9_client_write(struct p9_fid *fid, char *data, const char __user *udata,
if (count < rsize)
rsize = count;
- if (data)
- req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, offset,
- rsize, data);
- else
- req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, offset,
- rsize, udata);
+
+ /* Don't bother zerocopy form small IO (< 1024) */
+ if (((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) ==
+ P9_TRANS_PREF_PAYLOAD_SEP) && (rsize > 1024)) {
+ req = p9_client_rpc(clnt, P9_TWRITE, "dqE", fid->fid, offset,
+ rsize, data, udata);
+ } else {
+
+ if (data)
+ req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid,
+ offset, rsize, data);
+ else
+ req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid,
+ offset, rsize, udata);
+ }
if (IS_ERR(req)) {
err = PTR_ERR(req);
goto error;
--
1.6.5.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [V2 7/8] [net/9p] readdir zerocopy changes for 9P2000.L protocol.
2011-02-17 21:33 [V2-00/08] [net/9p] ZeroCopy patch series Venkateswararao Jujjuri (JV)
` (5 preceding siblings ...)
2011-02-17 21:33 ` [V2 6/8] [net/9p] Write " Venkateswararao Jujjuri (JV)
@ 2011-02-17 21:33 ` Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 8/8] [net/9p] Handle TREAD/RERROR case in !dotl case Venkateswararao Jujjuri (JV)
7 siblings, 0 replies; 9+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-17 21:33 UTC (permalink / raw)
To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)
Modify p9_client_readdir() to check the transport preference and act according
If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload
separately instead of putting it directly on PDU.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
net/9p/client.c | 11 +++++++++--
net/9p/protocol.c | 18 ++++++++++++++++++
2 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/net/9p/client.c b/net/9p/client.c
index 412c52e..6e07ef4 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1735,7 +1735,14 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset)
if (count < rsize)
rsize = count;
- req = p9_client_rpc(clnt, P9_TREADDIR, "dqd", fid->fid, offset, rsize);
+ if ((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) ==
+ P9_TRANS_PREF_PAYLOAD_SEP) {
+ req = p9_client_rpc(clnt, P9_TREADDIR, "dqF", fid->fid,
+ offset, rsize, data);
+ } else {
+ req = p9_client_rpc(clnt, P9_TREADDIR, "dqd", fid->fid,
+ offset, rsize);
+ }
if (IS_ERR(req)) {
err = PTR_ERR(req);
goto error;
@@ -1749,7 +1756,7 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset)
P9_DPRINTK(P9_DEBUG_9P, "<<< RREADDIR count %d\n", count);
- if (data)
+ if (!req->tc->pbuf_size && data)
memmove(data, dataptr, count);
p9_free_req(clnt, req);
diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index 7bca242..2ce515b 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -125,6 +125,15 @@ pdu_write_urw(struct p9_fcall *pdu, const char *kdata, const char __user *udata,
return 0;
}
+static size_t
+pdu_write_readdir(struct p9_fcall *pdu, const char *kdata, size_t size)
+{
+ BUG_ON(pdu->size > P9_READDIRHDRSZ);
+ pdu->pkbuf = (char *)kdata;
+ pdu->pbuf_size = size;
+ return 0;
+}
+
/*
b - int8_t
w - int16_t
@@ -466,6 +475,15 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
errcode = -EFAULT;
}
break;
+ case 'F':{
+ int32_t cnt = va_arg(ap, int32_t);
+ const char *k = va_arg(ap, const void *);
+ errcode = p9pdu_writef(pdu, proto_version, "d",
+ cnt);
+ if (!errcode && pdu_write_readdir(pdu, k, cnt))
+ errcode = -EFAULT;
+ }
+ break;
case 'U':{
int32_t count = va_arg(ap, int32_t);
const char __user *udata =
--
1.6.5.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [V2 8/8] [net/9p] Handle TREAD/RERROR case in !dotl case.
2011-02-17 21:33 [V2-00/08] [net/9p] ZeroCopy patch series Venkateswararao Jujjuri (JV)
` (6 preceding siblings ...)
2011-02-17 21:33 ` [V2 7/8] [net/9p] readdir " Venkateswararao Jujjuri (JV)
@ 2011-02-17 21:33 ` Venkateswararao Jujjuri (JV)
7 siblings, 0 replies; 9+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2011-02-17 21:33 UTC (permalink / raw)
To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri (JV)
This is a rather sticky issue to deal with in non-9P2000.L protocols.
In 9P2000.L, the error is a fixed size (errno) hence not an issue.
But for other protocols it is a string of size (ERRMAX).
To take care of TREAD/RERROR scenario in !dotl this change makes sure that
the read buffer is big enough to accommodate ERRMAX string.
It also means that there is a chance of scribbling on the payload/user
buffer in the error case for those non-POSIX complaint protocols.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
---
net/9p/client.c | 64 +++++++++++++++++++++++++++++++++++-------------------
1 files changed, 41 insertions(+), 23 deletions(-)
diff --git a/net/9p/client.c b/net/9p/client.c
index 6e07ef4..251abb1 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -443,6 +443,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
{
int8_t type;
int err;
+ int ecode;
err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
if (err) {
@@ -450,36 +451,53 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
return err;
}
- if (type == P9_RERROR || type == P9_RLERROR) {
- int ecode;
-
- if (!p9_is_proto_dotl(c)) {
- char *ename;
+ if (type != P9_RERROR && type != P9_RLERROR)
+ return 0;
- err = p9pdu_readf(req->rc, c->proto_version, "s?d",
- &ename, &ecode);
- if (err)
- goto out_err;
+ if (!p9_is_proto_dotl(c)) {
+ char *ename;
+
+ if (req->tc->pbuf_size) {
+ /* Handle user buffers */
+ size_t len = req->rc->size - req->rc->offset;
+ if (req->tc->pubuf) {
+ /* User Buffer */
+ err = copy_from_user(
+ &req->rc->sdata[req->rc->offset],
+ req->tc->pubuf, len);
+ if (err) {
+ err = -EFAULT;
+ goto out_err;
+ }
+ } else {
+ /* Kernel Buffer */
+ memmove(&req->rc->sdata[req->rc->offset],
+ req->tc->pkbuf, len);
+ }
+ }
+ err = p9pdu_readf(req->rc, c->proto_version, "s?d",
+ &ename, &ecode);
+ if (err)
+ goto out_err;
- if (p9_is_proto_dotu(c))
- err = -ecode;
+ if (p9_is_proto_dotu(c))
+ err = -ecode;
- if (!err || !IS_ERR_VALUE(err)) {
- err = p9_errstr2errno(ename, strlen(ename));
+ if (!err || !IS_ERR_VALUE(err)) {
+ err = p9_errstr2errno(ename, strlen(ename));
- P9_DPRINTK(P9_DEBUG_9P, "<<< RERROR (%d) %s\n", -ecode, ename);
+ P9_DPRINTK(P9_DEBUG_9P, "<<< RERROR (%d) %s\n", -ecode,
+ ename);
- kfree(ename);
- }
- } else {
- err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
- err = -ecode;
-
- P9_DPRINTK(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
+ kfree(ename);
}
+ } else {
+ err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
+ err = -ecode;
+
+ P9_DPRINTK(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
+ }
- } else
- err = 0;
return err;
--
1.6.5.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-02-17 23:30 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-17 21:33 [V2-00/08] [net/9p] ZeroCopy patch series Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 1/8] [net/9p] Preparation and helper functions for zero copy Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 2/8] [net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 3/8] [net/9p] Add gup/zero_copy support to VirtIO transport layer Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 4/8] [net/9p] Add preferences to " Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 5/8] [net/9p] Read side zerocopy changes for 9P2000.L protocol Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 6/8] [net/9p] Write " Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 7/8] [net/9p] readdir " Venkateswararao Jujjuri (JV)
2011-02-17 21:33 ` [V2 8/8] [net/9p] Handle TREAD/RERROR case in !dotl case Venkateswararao Jujjuri (JV)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).