From: Chuck Lever III <chuck.lever@oracle.com>
To: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
Matthew Wilcox <willy@infradead.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Christoph Hellwig <hch@infradead.org>,
Jens Axboe <axboe@kernel.dk>, Jeffrey Layton <jlayton@kernel.org>,
Christian Brauner <brauner@kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Anna Schumaker <anna@kernel.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [RFC PATCH 27/28] sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
Date: Thu, 16 Mar 2023 17:10:21 +0000 [thread overview]
Message-ID: <4EDC79DC-0C32-40FD-9C35-164C7A077922@oracle.com> (raw)
In-Reply-To: <3DFBF27C-A62B-4AFE-87FD-3DF53FC39E8E@hammerspace.com>
Note: this is the first I've seen of this series -- not sure why
I never received any of these patches.
That means I haven't seen the cover letter and do not have any
context for this proposed change.
> On Mar 16, 2023, at 12:17 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
>
>> On Mar 16, 2023, at 11:26, David Howells <dhowells@redhat.com> wrote:
>>
>> When transmitting data, call down into TCP using a single sendmsg with
>> MSG_SPLICE_PAGES to indicate that content should be spliced rather than
>> performing several sendmsg and sendpage calls to transmit header, data
>> pages and trailer.
We've tried combining the sendpages calls in here before. It
results in a significant and measurable performance regression.
See:
da1661b93bf4 ("SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends")
and it's subsequent revert:
4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again")
Therefore, this kind of change needs to be accompanied by both
benchmark results and some field testing to convince me it won't
cause harm.
Also, I'd rather see struct xdr_buf changed to /replace/ the
head/pagevec/tail arrangement with bvecs before we do this
kind of overhaul.
And, we have to make certain that this doesn't break operation
with kTLS sockets... do they support MSG_SPLICE_PAGES ?
>> To make this work, the data is assembled in a bio_vec array and attached to
>> a BVEC-type iterator. The bio_vec array has two extra slots before the
>> first for headers and one after the last for a trailer. The headers and
>> trailer are copied into memory acquired from zcopy_alloc() which just
>> breaks a page up into small pieces that can be freed with put_page().
>>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Trond Myklebust <trond.myklebust@hammerspace.com>
>> cc: Anna Schumaker <anna@kernel.org>
>> cc: Chuck Lever <chuck.lever@oracle.com>
>> cc: Jeff Layton <jlayton@kernel.org>
>> cc: "David S. Miller" <davem@davemloft.net>
>> cc: Eric Dumazet <edumazet@google.com>
>> cc: Jakub Kicinski <kuba@kernel.org>
>> cc: Paolo Abeni <pabeni@redhat.com>
>> cc: Jens Axboe <axboe@kernel.dk>
>> cc: Matthew Wilcox <willy@infradead.org>
>> cc: linux-nfs@vger.kernel.org
>> cc: netdev@vger.kernel.org
>> ---
>> net/sunrpc/svcsock.c | 70 ++++++++++++--------------------------------
>> net/sunrpc/xdr.c | 24 ++++++++++++---
>> 2 files changed, 38 insertions(+), 56 deletions(-)
>>
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index 03a4f5615086..1fa41ddbc40e 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -36,6 +36,7 @@
>> #include <linux/skbuff.h>
>> #include <linux/file.h>
>> #include <linux/freezer.h>
>> +#include <linux/zcopy_alloc.h>
>> #include <net/sock.h>
>> #include <net/checksum.h>
>> #include <net/ip.h>
>> @@ -1060,16 +1061,8 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
>> return 0; /* record not complete */
>> }
>>
>> -static int svc_tcp_send_kvec(struct socket *sock, const struct kvec *vec,
>> - int flags)
>> -{
>> - return kernel_sendpage(sock, virt_to_page(vec->iov_base),
>> - offset_in_page(vec->iov_base),
>> - vec->iov_len, flags);
>> -}
>> -
>> /*
>> - * kernel_sendpage() is used exclusively to reduce the number of
>> + * MSG_SPLICE_PAGES is used exclusively to reduce the number of
>> * copy operations in this path. Therefore the caller must ensure
>> * that the pages backing @xdr are unchanging.
>> *
>> @@ -1081,65 +1074,38 @@ static int svc_tcp_sendmsg(struct socket *sock, struct xdr_buf *xdr,
>> {
>> const struct kvec *head = xdr->head;
>> const struct kvec *tail = xdr->tail;
>> - struct kvec rm = {
>> - .iov_base = &marker,
>> - .iov_len = sizeof(marker),
>> - };
>> struct msghdr msg = {
>> - .msg_flags = 0,
>> + .msg_flags = MSG_SPLICE_PAGES,
>> };
>> - int ret;
>> + int ret, n = xdr_buf_pagecount(xdr), size;
>>
>> *sentp = 0;
>> ret = xdr_alloc_bvec(xdr, GFP_KERNEL);
>> if (ret < 0)
>> return ret;
>>
>> - ret = kernel_sendmsg(sock, &msg, &rm, 1, rm.iov_len);
>> + ret = zcopy_memdup(sizeof(marker), &marker, &xdr->bvec[-2], GFP_KERNEL);
>> if (ret < 0)
>> return ret;
>> - *sentp += ret;
>> - if (ret != rm.iov_len)
>> - return -EAGAIN;
>>
>> - ret = svc_tcp_send_kvec(sock, head, 0);
>> + ret = zcopy_memdup(head->iov_len, head->iov_base, &xdr->bvec[-1], GFP_KERNEL);
>> if (ret < 0)
>> return ret;
>> - *sentp += ret;
>> - if (ret != head->iov_len)
>> - goto out;
>>
>> - if (xdr->page_len) {
>> - unsigned int offset, len, remaining;
>> - struct bio_vec *bvec;
>> -
>> - bvec = xdr->bvec + (xdr->page_base >> PAGE_SHIFT);
>> - offset = offset_in_page(xdr->page_base);
>> - remaining = xdr->page_len;
>> - while (remaining > 0) {
>> - len = min(remaining, bvec->bv_len - offset);
>> - ret = kernel_sendpage(sock, bvec->bv_page,
>> - bvec->bv_offset + offset,
>> - len, 0);
>> - if (ret < 0)
>> - return ret;
>> - *sentp += ret;
>> - if (ret != len)
>> - goto out;
>> - remaining -= len;
>> - offset = 0;
>> - bvec++;
>> - }
>> - }
>> + ret = zcopy_memdup(tail->iov_len, tail->iov_base, &xdr->bvec[n], GFP_KERNEL);
>> + if (ret < 0)
>> + return ret;
>>
>> - if (tail->iov_len) {
>> - ret = svc_tcp_send_kvec(sock, tail, 0);
>> - if (ret < 0)
>> - return ret;
>> - *sentp += ret;
>> - }
>> + size = sizeof(marker) + head->iov_len + xdr->page_len + tail->iov_len;
>> + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, xdr->bvec - 2, n + 3, size);
>>
>> -out:
>> + ret = sock_sendmsg(sock, &msg);
>> + if (ret < 0)
>> + return ret;
>> + if (ret > 0)
>> + *sentp = ret;
>> + if (ret != size)
>> + return -EAGAIN;
>> return 0;
>> }
>>
>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
>> index 36835b2f5446..6dff0b4f17b8 100644
>> --- a/net/sunrpc/xdr.c
>> +++ b/net/sunrpc/xdr.c
>> @@ -145,14 +145,19 @@ xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp)
>> {
>> size_t i, n = xdr_buf_pagecount(buf);
>>
>> - if (n != 0 && buf->bvec == NULL) {
>> - buf->bvec = kmalloc_array(n, sizeof(buf->bvec[0]), gfp);
>> + if (buf->bvec == NULL) {
>> + /* Allow for two headers and a trailer to be attached */
>> + buf->bvec = kmalloc_array(n + 3, sizeof(buf->bvec[0]), gfp);
>> if (!buf->bvec)
>> return -ENOMEM;
>> + buf->bvec += 2;
>> + buf->bvec[-2].bv_page = NULL;
>> + buf->bvec[-1].bv_page = NULL;
>
> NACK.
>
>> for (i = 0; i < n; i++) {
>> bvec_set_page(&buf->bvec[i], buf->pages[i], PAGE_SIZE,
>> 0);
>> }
>> + buf->bvec[n].bv_page = NULL;
>> }
>> return 0;
>> }
>> @@ -160,8 +165,19 @@ xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp)
>> void
>> xdr_free_bvec(struct xdr_buf *buf)
>> {
>> - kfree(buf->bvec);
>> - buf->bvec = NULL;
>> + if (buf->bvec) {
>> + size_t n = xdr_buf_pagecount(buf);
>> +
>> + if (buf->bvec[-2].bv_page)
>> + put_page(buf->bvec[-2].bv_page);
>> + if (buf->bvec[-1].bv_page)
>> + put_page(buf->bvec[-1].bv_page);
>> + if (buf->bvec[n].bv_page)
>> + put_page(buf->bvec[n].bv_page);
>> + buf->bvec -= 2;
>> + kfree(buf->bvec);
>> + buf->bvec = NULL;
>> + }
>> }
>>
>> /**
>>
>
--
Chuck Lever
next prev parent reply other threads:[~2023-03-16 17:11 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20230316152618.711970-1-dhowells@redhat.com>
2023-03-16 15:26 ` [RFC PATCH 27/28] sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage David Howells
2023-03-16 16:17 ` Trond Myklebust
2023-03-16 16:24 ` David Howells
[not found] ` <8F8B62FD-0F16-4383-BB34-97E850DAA7AF@hammerspace.com>
2023-03-16 18:06 ` David Howells
2023-03-16 19:01 ` Trond Myklebust
2023-03-22 13:10 ` David Howells
2023-03-22 18:15 ` [RFC PATCH] iov_iter: Add an iterator-of-iterators David Howells
2023-03-22 18:47 ` Trond Myklebust
2023-03-22 18:49 ` Matthew Wilcox
2023-03-16 17:10 ` Chuck Lever III [this message]
2023-03-16 17:28 ` [RFC PATCH 27/28] sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage David Howells
2023-03-16 17:41 ` Chuck Lever III
2023-03-16 21:21 ` David Howells
2023-03-17 15:29 ` Chuck Lever III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EDC79DC-0C32-40FD-9C35-164C7A077922@oracle.com \
--to=chuck.lever@oracle.com \
--cc=anna@kernel.org \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=hch@infradead.org \
--cc=jlayton@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=trondmy@hammerspace.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox