From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Venkateswararao Jujjuri (JV)" Subject: Re: [RFC-V3 6/7] [net/9p] Read and Write side zerocopy changes for 9P2000.L protocol. Date: Fri, 11 Feb 2011 13:03:44 -0800 Message-ID: <4D55A430.8070406@linux.vnet.ibm.com> References: <1297387511-2697-1-git-send-email-jvrao@linux.vnet.ibm.com> <1297387511-2697-7-git-send-email-jvrao@linux.vnet.ibm.com> <87mxm2nz49.fsf@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org To: "Aneesh Kumar K. V" Return-path: Received: from e5.ny.us.ibm.com ([32.97.182.145]:49331 "EHLO e5.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758116Ab1BKVDs (ORCPT ); Fri, 11 Feb 2011 16:03:48 -0500 Received: from d01dlp02.pok.ibm.com (d01dlp02.pok.ibm.com [9.56.224.85]) by e5.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p1BKdDBk029676 for ; Fri, 11 Feb 2011 15:39:41 -0500 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 55BDE4DE803F for ; Fri, 11 Feb 2011 16:02:53 -0500 (EST) Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p1BL3khc460228 for ; Fri, 11 Feb 2011 16:03:46 -0500 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p1BL3jY8013963 for ; Fri, 11 Feb 2011 14:03:46 -0700 In-Reply-To: <87mxm2nz49.fsf@linux.vnet.ibm.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 2/11/2011 11:35 AM, Aneesh Kumar K. V wrote: > On Thu, 10 Feb 2011 17:25:10 -0800, "Venkateswararao Jujjuri (JV)" wrote: >> Signed-off-by: Venkateswararao Jujjuri >> --- >> net/9p/client.c | 47 +++++++++++++++++++++++++++++++++-------------- >> net/9p/protocol.c | 21 +++++++++++++++++++++ >> 2 files changed, 54 insertions(+), 14 deletions(-) >> >> diff --git a/net/9p/client.c b/net/9p/client.c >> index a848bca..f6d8531 100644 >> --- a/net/9p/client.c >> +++ b/net/9p/client.c >> @@ -1270,7 +1270,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset, >> if (count < rsize) >> rsize = count; >> >> - req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize); >> + /* Don't bother zerocopy form small IO (< 1024) */ >> + if (((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) == >> + P9_TRANS_PREF_PAYLOAD_SEP) && (rsize > 1024)) { >> + req = p9_client_rpc(clnt, P9_TREAD, "dqE", fid->fid, offset, >> + rsize, data, udata); >> + } else { >> + req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, >> + rsize); >> + } >> if (IS_ERR(req)) { >> err = PTR_ERR(req); >> goto error; >> @@ -1284,13 +1292,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset, >> >> P9_DPRINTK(P9_DEBUG_9P, "<<< RREAD count %d\n", count); >> >> - if (data) { >> - memmove(data, dataptr, count); >> - } else { >> - err = copy_to_user(udata, dataptr, count); >> - if (err) { >> - err = -EFAULT; >> - goto free_and_error; >> + if (!req->tc->pbuf_size) { >> + if (data) { >> + memmove(data, dataptr, count); >> + } else { >> + err = copy_to_user(udata, dataptr, count); >> + if (err) { >> + err = -EFAULT; >> + goto free_and_error; >> + } >> } >> } >> p9_free_req(clnt, req); >> @@ -1323,12 +1333,21 @@ p9_client_write(struct p9_fid *fid, char *data, const char __user *udata, >> >> if (count < rsize) >> rsize = count; >> - if (data) >> - req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, offset, >> - rsize, data); >> - else >> - req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, offset, >> - rsize, udata); >> + >> + /* Don't bother zerocopy form small IO (< 1024) */ >> + if (((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) == >> + P9_TRANS_PREF_PAYLOAD_SEP) && (rsize > 1024)) { >> + req = p9_client_rpc(clnt, P9_TWRITE, "dqE", fid->fid, offset, >> + rsize, data, udata); >> + } else { >> + if (data) >> + req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, >> + offset, rsize, data); >> + else >> + req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, >> + offset, rsize, udata); >> + } >> + >> if (IS_ERR(req)) { >> err = PTR_ERR(req); >> goto error; >> diff --git a/net/9p/protocol.c b/net/9p/protocol.c >> index 5936c50..830b999 100644 >> --- a/net/9p/protocol.c >> +++ b/net/9p/protocol.c >> @@ -114,6 +114,17 @@ pdu_write_u(struct p9_fcall *pdu, const char __user *udata, size_t size) >> return size - len; >> } >> >> +static size_t >> +pdu_write_urw(struct p9_fcall *pdu, const char *kdata, const char __user *udata, >> + size_t size) >> +{ >> + size_t len = min(pdu->capacity - pdu->size, size); > > Why do we need to do this ? We are not placing anything in the pdu right ? But still as per protocol each packet should be <= msize. I will have another patch series introducing another size bufsize in addition to msize. Where bufsize will represent small PDUs (which is <= msize). > >> + pdu->pubuf = (char __user *)udata; >> + pdu->pkbuf = (char *)kdata; >> + pdu->pbuf_size = len; >> + return size - len; > > Does this mean a zero copy write of a buffer larger than msize will > result in a failure ? No; No failures. If your recall, all our read/write routines doesn't guarantee writing entire request in one shot..At multiple levels we adjust the read/write size and the top level takes care of it by dividing the buffer into multiple reads/writes. This is not anything new in zero copy .. same thing happens even in the current code. (pdu_write_u) - JV > >> +} >> + >> /* >> b - int8_t >> w - int16_t >> @@ -445,6 +456,16 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt, >> errcode = -EFAULT; >> } >> break; >> + case 'E':{ >> + int32_t cnt = va_arg(ap, int32_t); >> + const char *k = va_arg(ap, const void *); >> + const char *u = va_arg(ap, const void *); >> + errcode = p9pdu_writef(pdu, proto_version, "d", >> + cnt); >> + if (!errcode && pdu_write_urw(pdu, k, u, cnt)) >> + errcode = -EFAULT; >> + } >> + break; >> case 'U':{ >> int32_t count = va_arg(ap, int32_t); >> const char __user *udata = > > > -aneesh