From: Robert Jennings <rcj@linux.vnet.ibm.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, Alexander Viro <viro@zeniv.linux.org.uk>,
Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Dave Hansen <dave@sr71.net>,
Matt Helsley <matt.helsley@gmail.com>,
Anthony Liguori <anthony@codemonkey.ws>,
Michael Roth <mdroth@linux.vnet.ibm.com>,
Lei Li <lilei@linux.vnet.ibm.com>,
Leonardo Garcia <lagarcia@linux.vnet.ibm.com>
Subject: Re: [PATCH 1/2] vmsplice: unmap gifted pages for recipient
Date: Thu, 17 Oct 2013 08:48:27 -0500 [thread overview]
Message-ID: <20131017134827.GB19741@linux.vnet.ibm.com> (raw)
In-Reply-To: <525FB9EE.3070609@suse.cz>
* Vlastimil Babka (vbabka@suse.cz) wrote:
> On 10/07/2013 10:21 PM, Robert C Jennings wrote:
> > Introduce use of the unused SPLICE_F_MOVE flag for vmsplice to zap
> > pages.
> >
> > When vmsplice is called with flags (SPLICE_F_GIFT | SPLICE_F_MOVE) the
> > writer's gift'ed pages would be zapped. This patch supports further work
> > to move vmsplice'd pages rather than copying them. That patch has the
> > restriction that the page must not be mapped by the source for the move,
> > otherwise it will fall back to copying the page.
> >
> > Signed-off-by: Matt Helsley <matt.helsley@gmail.com>
> > Signed-off-by: Robert C Jennings <rcj@linux.vnet.ibm.com>
> > ---
> > Since the RFC went out I have coalesced the zap_page_range() call to
> > operate on VMAs rather than calling this for each page. For a 256MB
> > vmsplice this reduced the write side 50% from the RFC.
> > ---
> > fs/splice.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > include/linux/splice.h | 1 +
> > 2 files changed, 51 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/splice.c b/fs/splice.c
> > index 3b7ee65..a62d61e 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -188,12 +188,17 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> > {
> > unsigned int spd_pages = spd->nr_pages;
> > int ret, do_wakeup, page_nr;
> > + struct vm_area_struct *vma;
> > + unsigned long user_start, user_end;
> >
> > ret = 0;
> > do_wakeup = 0;
> > page_nr = 0;
> > + vma = NULL;
> > + user_start = user_end = 0;
> >
> > pipe_lock(pipe);
> > + down_read(¤t->mm->mmap_sem);
>
> Seems like you could take the mmap_sem only when GIFT and MOVE is set.
> Maybe it won't help that much for performance but at least serve as
> documenting the reason it's needed?
>
> Vlastimil
>
I had been doing that previously but moving this outside the loop and
acquiring it once did improve performance. I'll add a comment on
down_read() as to the reason for taking this though.
-Rob
> > for (;;) {
> > if (!pipe->readers) {
> > @@ -212,8 +217,44 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> > buf->len = spd->partial[page_nr].len;
> > buf->private = spd->partial[page_nr].private;
> > buf->ops = spd->ops;
> > - if (spd->flags & SPLICE_F_GIFT)
> > + if (spd->flags & SPLICE_F_GIFT) {
> > + unsigned long useraddr =
> > + spd->partial[page_nr].useraddr;
> > +
> > + if ((spd->flags & SPLICE_F_MOVE) &&
> > + !buf->offset &&
> > + (buf->len == PAGE_SIZE)) {
> > + /* Can move page aligned buf, gather
> > + * requests to make a single
> > + * zap_page_range() call per VMA
> > + */
> > + if (vma && (useraddr == user_end) &&
> > + ((useraddr + PAGE_SIZE) <=
> > + vma->vm_end)) {
> > + /* same vma, no holes */
> > + user_end += PAGE_SIZE;
> > + } else {
> > + if (vma)
> > + zap_page_range(vma,
> > + user_start,
> > + (user_end -
> > + user_start),
> > + NULL);
> > + vma = find_vma_intersection(
> > + current->mm,
> > + useraddr,
> > + (useraddr +
> > + PAGE_SIZE));
> > + if (!IS_ERR_OR_NULL(vma)) {
> > + user_start = useraddr;
> > + user_end = (useraddr +
> > + PAGE_SIZE);
> > + } else
> > + vma = NULL;
> > + }
> > + }
> > buf->flags |= PIPE_BUF_FLAG_GIFT;
> > + }
> >
> > pipe->nrbufs++;
> > page_nr++;
> > @@ -255,6 +296,10 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> > pipe->waiting_writers--;
> > }
> >
> > + if (vma)
> > + zap_page_range(vma, user_start, (user_end - user_start), NULL);
> > +
> > + up_read(¤t->mm->mmap_sem);
> > pipe_unlock(pipe);
> >
> > if (do_wakeup)
> > @@ -485,6 +530,7 @@ fill_it:
> >
> > spd.partial[page_nr].offset = loff;
> > spd.partial[page_nr].len = this_len;
> > + spd.partial[page_nr].useraddr = index << PAGE_CACHE_SHIFT;
> > len -= this_len;
> > loff = 0;
> > spd.nr_pages++;
> > @@ -656,6 +702,7 @@ ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
> > this_len = min_t(size_t, vec[i].iov_len, res);
> > spd.partial[i].offset = 0;
> > spd.partial[i].len = this_len;
> > + spd.partial[i].useraddr = (unsigned long)vec[i].iov_base;
> > if (!this_len) {
> > __free_page(spd.pages[i]);
> > spd.pages[i] = NULL;
> > @@ -1475,6 +1522,8 @@ static int get_iovec_page_array(const struct iovec __user *iov,
> >
> > partial[buffers].offset = off;
> > partial[buffers].len = plen;
> > + partial[buffers].useraddr = (unsigned long)base;
> > + base = (void*)((unsigned long)base + PAGE_SIZE);
> >
> > off = 0;
> > len -= plen;
> > diff --git a/include/linux/splice.h b/include/linux/splice.h
> > index 74575cb..56661e3 100644
> > --- a/include/linux/splice.h
> > +++ b/include/linux/splice.h
> > @@ -44,6 +44,7 @@ struct partial_page {
> > unsigned int offset;
> > unsigned int len;
> > unsigned long private;
> > + unsigned long useraddr;
> > };
> >
> > /*
> >
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-10-17 13:50 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-07 20:21 [PATCH 0/2] vmpslice support for zero-copy gifting of pages Robert C Jennings
2013-10-07 20:21 ` [PATCH 1/2] vmsplice: unmap gifted pages for recipient Robert C Jennings
2013-10-08 16:14 ` Dave Hansen
2013-10-08 19:48 ` Robert Jennings
2013-10-08 21:22 ` Dave Hansen
2013-10-08 16:23 ` Dave Hansen
2013-10-17 13:54 ` Robert Jennings
2013-10-17 10:20 ` Vlastimil Babka
2013-10-17 13:48 ` Robert Jennings [this message]
2013-10-18 8:10 ` Vlastimil Babka
2013-10-07 20:21 ` [PATCH 2/2] vmsplice: Add limited zero copy to vmsplice Robert C Jennings
2013-10-08 16:45 ` Dave Hansen
2013-10-08 17:35 ` Robert Jennings
2013-10-17 11:23 ` Vlastimil Babka
2013-10-17 13:44 ` Robert Jennings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131017134827.GB19741@linux.vnet.ibm.com \
--to=rcj@linux.vnet.ibm.com \
--cc=aarcange@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=dave@sr71.net \
--cc=lagarcia@linux.vnet.ibm.com \
--cc=lilei@linux.vnet.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matt.helsley@gmail.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).