From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Van Hensbergen Subject: Re: [V9fs-developer] :[RFC] [PATCH 6/7] [net/9p] Read and Write side zerocopy changes for 9P2000.L protocol. Date: Tue, 8 Feb 2011 15:16:37 -0600 Message-ID: References: <1297063283-2180-1-git-send-email-jvrao@linux.vnet.ibm.com> <1297063283-2180-7-git-send-email-jvrao@linux.vnet.ibm.com> <4D4F97ED.9070401@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-fsdevel@vger.kernel.org, v9fs-developer@lists.sourceforge.net To: "Venkateswararao Jujjuri (JV)" Return-path: Received: from mail-fx0-f46.google.com ([209.85.161.46]:55046 "EHLO mail-fx0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751040Ab1BHVQj convert rfc822-to-8bit (ORCPT ); Tue, 8 Feb 2011 16:16:39 -0500 Received: by fxm20 with SMTP id 20so6621097fxm.19 for ; Tue, 08 Feb 2011 13:16:37 -0800 (PST) In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: oh and, for reference, while a different environment, my request is based on a little bit more than idle fancy. Check out the graph on page 6 in: http://citeseerx.ist.psu.edu/viewdoc/download?doi=3D10.1.1.1= 08.8182&rep=3Drep1&type=3Dpdf -eric On Tue, Feb 8, 2011 at 3:09 PM, Eric Van Hensbergen = wrote: > One thing I wonder is if we always want to zero copy for payload. =A0= In > the extreme, do we want to take the overhead of pinning an extra page > if we are only reading/writing a byte? =A0memcpy is expensive for lar= ge > packets, but may actually be more efficient for small packets. > > Have we done any performance measurements of this code with various > payload sizes versus non-zero-copy? =A0Of course that may not really > show the impact of pinning the extra pages.... > > In any case, if such a tradeoff did exist, we might choose to not do > zero copy for requests smaller than some size -- and that might > alleviate some of the problems with the legacy protocols (such as the > Rerror issue) -- killing two birds with one stone. =A0In any case, gi= ven > the implementations it should be really easy to shut me up with > comparison data of zc and non-zc for 1, 64, 128, 256, 512, 1024, 2048= , > 4192 byte payloads (without caches enabled of course). > > =A0 =A0 =A0 -eric > > > On Mon, Feb 7, 2011 at 12:57 AM, Venkateswararao Jujjuri (JV) > wrote: >> On 2/6/2011 11:21 PM, Venkateswararao Jujjuri (JV) wrote: >>> Signed-off-by: Venkateswararao Jujjuri >>> --- >>> =A0net/9p/client.c =A0 | =A0 45 +++++++++++++++++++++++++++++++----= ---------- >>> =A0net/9p/protocol.c | =A0 38 +++++++++++++++++++++++++++++++++++++= + >>> =A02 files changed, 69 insertions(+), 14 deletions(-) >>> >>> diff --git a/net/9p/client.c b/net/9p/client.c >>> index a848bca..f939edf 100644 >>> --- a/net/9p/client.c >>> +++ b/net/9p/client.c >>> @@ -1270,7 +1270,14 @@ p9_client_read(struct p9_fid *fid, char *dat= a, char __user *udata, u64 offset, >>> =A0 =A0 =A0 if (count < rsize) >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 rsize =3D count; >>> >>> - =A0 =A0 req =3D p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, of= fset, rsize); >>> + =A0 =A0 if ((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) = =3D=3D >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 P9_TRANS_PREF_PAYLOAD_SEP= ) { >>> + =A0 =A0 =A0 =A0 =A0 =A0 req =3D p9_client_rpc(clnt, P9_TREAD, "dq= E", fid->fid, offset, >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rsize, da= ta ? data : udata); >>> + =A0 =A0 } else { >>> + =A0 =A0 =A0 =A0 =A0 =A0 req =3D p9_client_rpc(clnt, P9_TREAD, "dq= d", fid->fid, offset, >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rsize); >>> + =A0 =A0 } >>> =A0 =A0 =A0 if (IS_ERR(req)) { >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 err =3D PTR_ERR(req); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto error; >>> @@ -1284,13 +1291,15 @@ p9_client_read(struct p9_fid *fid, char *da= ta, char __user *udata, u64 offset, >>> >>> =A0 =A0 =A0 P9_DPRINTK(P9_DEBUG_9P, "<<< RREAD count %d\n", count); >>> >>> - =A0 =A0 if (data) { >>> - =A0 =A0 =A0 =A0 =A0 =A0 memmove(data, dataptr, count); >>> - =A0 =A0 } else { >>> - =A0 =A0 =A0 =A0 =A0 =A0 err =3D copy_to_user(udata, dataptr, coun= t); >>> - =A0 =A0 =A0 =A0 =A0 =A0 if (err) { >>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 err =3D -EFAULT; >>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto free_and_error; >>> + =A0 =A0 if (!req->tc->pbuf_size) { >>> + =A0 =A0 =A0 =A0 =A0 =A0 if (data) { >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 memmove(data, dataptr, co= unt); >>> + =A0 =A0 =A0 =A0 =A0 =A0 } else { >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 err =3D copy_to_user(udat= a, dataptr, count); >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (err) { >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 err =3D -= EFAULT; >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto free= _and_error; >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 } >>> =A0 =A0 =A0 } >>> =A0 =A0 =A0 p9_free_req(clnt, req); >>> @@ -1323,12 +1332,20 @@ p9_client_write(struct p9_fid *fid, char *d= ata, const char __user *udata, >>> >>> =A0 =A0 =A0 if (count < rsize) >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 rsize =3D count; >>> - =A0 =A0 if (data) >>> - =A0 =A0 =A0 =A0 =A0 =A0 req =3D p9_client_rpc(clnt, P9_TWRITE, "d= qD", fid->fid, offset, >>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rsize, data); >>> - =A0 =A0 else >>> - =A0 =A0 =A0 =A0 =A0 =A0 req =3D p9_client_rpc(clnt, P9_TWRITE, "d= qU", fid->fid, offset, >>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rsize, udata); >>> + >>> + =A0 =A0 if ((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) = =3D=3D >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 P9_TRANS_PREF_PAYLOAD_SEP= ) { >>> + =A0 =A0 =A0 =A0 =A0 =A0 req =3D p9_client_rpc(clnt, P9_TWRITE, "d= qF", fid->fid, offset, >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rsize, da= ta ? data : udata); >>> + =A0 =A0 } else { >>> + =A0 =A0 =A0 =A0 =A0 =A0 if (data) >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 req =3D p9_client_rpc(cln= t, P9_TWRITE, "dqD", fid->fid, >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 offset, rsize, data); >>> + =A0 =A0 =A0 =A0 =A0 =A0 else >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 req =3D p9_client_rpc(cln= t, P9_TWRITE, "dqU", fid->fid, >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 offset, rsize, udata); >>> + =A0 =A0 } >>> + >>> =A0 =A0 =A0 if (IS_ERR(req)) { >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 err =3D PTR_ERR(req); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto error; >>> diff --git a/net/9p/protocol.c b/net/9p/protocol.c >>> index dfc358f..ea778dd 100644 >>> --- a/net/9p/protocol.c >>> +++ b/net/9p/protocol.c >>> @@ -114,6 +114,24 @@ pdu_write_u(struct p9_fcall *pdu, const char _= _user *udata, size_t size) >>> =A0 =A0 =A0 return size - len; >>> =A0} >>> >>> +static size_t >>> +pdu_write_uw(struct p9_fcall *pdu, const char *udata, size_t size) >>> +{ >>> + =A0 =A0 size_t len =3D min(pdu->capacity - pdu->size, size); >>> + =A0 =A0 pdu->pbuf =3D udata; >>> + =A0 =A0 pdu->pbuf_size =3D len; >>> + =A0 =A0 return size - len; >>> +} >>> + >>> +static size_t >>> +pdu_write_ur(struct p9_fcall *pdu, const char *udata, size_t size) >>> +{ >>> + =A0 =A0 size_t len =3D min(pdu->capacity - pdu->size, size); >>> + =A0 =A0 pdu->pbuf =3D udata; >>> + =A0 =A0 pdu->pbuf_size =3D len; >>> + =A0 =A0 return size - len; >>> +} >>> + >>> =A0/* >>> =A0 =A0 =A0 b - int8_t >>> =A0 =A0 =A0 w - int16_t >>> @@ -445,6 +463,26 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_= version, const char *fmt, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 errcode =3D -EFAULT; >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; >>> + =A0 =A0 =A0 =A0 =A0 =A0 case 'E':{ >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0int32_= t count =3D va_arg(ap, int32_t); >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0const = char *udata =3D va_arg(ap, const void *); >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0errcod= e =3D p9pdu_writef(pdu, proto_version, "d", >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0count); >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!e= rrcode && pdu_write_ur(pdu, udata, >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0count)) >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0errcode =3D -EFAULT; >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; >>> + =A0 =A0 =A0 =A0 =A0 =A0 case 'F':{ >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0int32_= t count =3D va_arg(ap, int32_t); >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0const = char *udata =3D va_arg(ap, const void *); >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0errcod= e =3D p9pdu_writef(pdu, proto_version, "d", >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0count); >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!e= rrcode && pdu_write_uw(pdu, udata, >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0count)) >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0errcode =3D -EFAULT; >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 case 'U':{ >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 int32_t= count =3D va_arg(ap, int32_t); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 const c= har __user *udata =3D >> >> >> >> --------------------------------------------------------------------= ---------- >> The modern datacenter depends on network connectivity to access reso= urces >> and provide services. The best practices for maximizing a physical s= erver's >> connectivity to a physical network are well understood - see how the= se >> rules translate into the virtual world? >> http://p.sf.net/sfu/oracle-sfdevnlfb >> _______________________________________________ >> V9fs-developer mailing list >> V9fs-developer@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/v9fs-developer >> > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html