From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Venkateswararao Jujjuri (JV)" Subject: Re: [V9fs-developer] :[RFC] [PATCH 6/7] [net/9p] Read and Write side zerocopy changes for 9P2000.L protocol. Date: Tue, 08 Feb 2011 15:50:40 -0800 Message-ID: <4D51D6D0.6040200@linux.vnet.ibm.com> References: <1297063283-2180-1-git-send-email-jvrao@linux.vnet.ibm.com> <1297063283-2180-7-git-send-email-jvrao@linux.vnet.ibm.com> <4D4F97ED.9070401@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org, v9fs-developer@lists.sourceforge.net To: Eric Van Hensbergen Return-path: Received: from e31.co.us.ibm.com ([32.97.110.149]:48643 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753860Ab1BHXuq (ORCPT ); Tue, 8 Feb 2011 18:50:46 -0500 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e31.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p18Na6wG030833 for ; Tue, 8 Feb 2011 16:36:06 -0700 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p18Noia2118936 for ; Tue, 8 Feb 2011 16:50:44 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p18NohpN024210 for ; Tue, 8 Feb 2011 16:50:43 -0700 In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 2/8/2011 1:09 PM, Eric Van Hensbergen wrote: > One thing I wonder is if we always want to zero copy for payload. In > the extreme, do we want to take the overhead of pinning an extra page > if we are only reading/writing a byte? memcpy is expensive for large > packets, but may actually be more efficient for small packets. I am not a memory expert, but I would assume memcpy also need to do same thing similar to get_user_pages() short of pinning pages. But I see the point. > > Have we done any performance measurements of this code with various > payload sizes versus non-zero-copy? Of course that may not really > show the impact of pinning the extra pages.... All our testing is with large buffers. Did not test with small buffers. > > In any case, if such a tradeoff did exist, we might choose to not do > zero copy for requests smaller than some size -- and that might > alleviate some of the problems with the legacy protocols (such as the > Rerror issue) -- killing two birds with one stone. In any case, given I think it is a wise decision to avoid zero copy if iosize+hdr_size <= pagesize. But It doesn't change any of today's complexity. Except may be saving an if condition. > the implementations it should be really easy to shut me up with > comparison data of zc and non-zc for 1, 64, 128, 256, 512, 1024, 2048, > 4192 byte payloads (without caches enabled of course). I think this is good experiment will publish data. > > -eric > > > On Mon, Feb 7, 2011 at 12:57 AM, Venkateswararao Jujjuri (JV) > wrote: >> On 2/6/2011 11:21 PM, Venkateswararao Jujjuri (JV) wrote: >>> Signed-off-by: Venkateswararao Jujjuri >>> --- >>> net/9p/client.c | 45 +++++++++++++++++++++++++++++++-------------- >>> net/9p/protocol.c | 38 ++++++++++++++++++++++++++++++++++++++ >>> 2 files changed, 69 insertions(+), 14 deletions(-) >>> >>> diff --git a/net/9p/client.c b/net/9p/client.c >>> index a848bca..f939edf 100644 >>> --- a/net/9p/client.c >>> +++ b/net/9p/client.c >>> @@ -1270,7 +1270,14 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset, >>> if (count < rsize) >>> rsize = count; >>> >>> - req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize); >>> + if ((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) == >>> + P9_TRANS_PREF_PAYLOAD_SEP) { >>> + req = p9_client_rpc(clnt, P9_TREAD, "dqE", fid->fid, offset, >>> + rsize, data ? data : udata); >>> + } else { >>> + req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, >>> + rsize); >>> + } >>> if (IS_ERR(req)) { >>> err = PTR_ERR(req); >>> goto error; >>> @@ -1284,13 +1291,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset, >>> >>> P9_DPRINTK(P9_DEBUG_9P, "<<< RREAD count %d\n", count); >>> >>> - if (data) { >>> - memmove(data, dataptr, count); >>> - } else { >>> - err = copy_to_user(udata, dataptr, count); >>> - if (err) { >>> - err = -EFAULT; >>> - goto free_and_error; >>> + if (!req->tc->pbuf_size) { >>> + if (data) { >>> + memmove(data, dataptr, count); >>> + } else { >>> + err = copy_to_user(udata, dataptr, count); >>> + if (err) { >>> + err = -EFAULT; >>> + goto free_and_error; >>> + } >>> } >>> } >>> p9_free_req(clnt, req); >>> @@ -1323,12 +1332,20 @@ p9_client_write(struct p9_fid *fid, char *data, const char __user *udata, >>> >>> if (count < rsize) >>> rsize = count; >>> - if (data) >>> - req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, offset, >>> - rsize, data); >>> - else >>> - req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, offset, >>> - rsize, udata); >>> + >>> + if ((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) == >>> + P9_TRANS_PREF_PAYLOAD_SEP) { >>> + req = p9_client_rpc(clnt, P9_TWRITE, "dqF", fid->fid, offset, >>> + rsize, data ? data : udata); >>> + } else { >>> + if (data) >>> + req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, >>> + offset, rsize, data); >>> + else >>> + req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, >>> + offset, rsize, udata); >>> + } >>> + >>> if (IS_ERR(req)) { >>> err = PTR_ERR(req); >>> goto error; >>> diff --git a/net/9p/protocol.c b/net/9p/protocol.c >>> index dfc358f..ea778dd 100644 >>> --- a/net/9p/protocol.c >>> +++ b/net/9p/protocol.c >>> @@ -114,6 +114,24 @@ pdu_write_u(struct p9_fcall *pdu, const char __user *udata, size_t size) >>> return size - len; >>> } >>> >>> +static size_t >>> +pdu_write_uw(struct p9_fcall *pdu, const char *udata, size_t size) >>> +{ >>> + size_t len = min(pdu->capacity - pdu->size, size); >>> + pdu->pbuf = udata; >>> + pdu->pbuf_size = len; >>> + return size - len; >>> +} >>> + >>> +static size_t >>> +pdu_write_ur(struct p9_fcall *pdu, const char *udata, size_t size) >>> +{ >>> + size_t len = min(pdu->capacity - pdu->size, size); >>> + pdu->pbuf = udata; >>> + pdu->pbuf_size = len; >>> + return size - len; >>> +} >>> + >>> /* >>> b - int8_t >>> w - int16_t >>> @@ -445,6 +463,26 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt, >>> errcode = -EFAULT; >>> } >>> break; >>> + case 'E':{ >>> + int32_t count = va_arg(ap, int32_t); >>> + const char *udata = va_arg(ap, const void *); >>> + errcode = p9pdu_writef(pdu, proto_version, "d", >>> + count); >>> + if (!errcode && pdu_write_ur(pdu, udata, >>> + count)) >>> + errcode = -EFAULT; >>> + } >>> + break; >>> + case 'F':{ >>> + int32_t count = va_arg(ap, int32_t); >>> + const char *udata = va_arg(ap, const void *); >>> + errcode = p9pdu_writef(pdu, proto_version, "d", >>> + count); >>> + if (!errcode && pdu_write_uw(pdu, udata, >>> + count)) >>> + errcode = -EFAULT; >>> + } >>> + break; >>> case 'U':{ >>> int32_t count = va_arg(ap, int32_t); >>> const char __user *udata = >> >> >> >> ------------------------------------------------------------------------------ >> The modern datacenter depends on network connectivity to access resources >> and provide services. The best practices for maximizing a physical server's >> connectivity to a physical network are well understood - see how these >> rules translate into the virtual world? >> http://p.sf.net/sfu/oracle-sfdevnlfb >> _______________________________________________ >> V9fs-developer mailing list >> V9fs-developer@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/v9fs-developer >>