All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Robert C Jennings <rcj@linux.vnet.ibm.com>, linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Rik van Riel <riel@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Hansen <dave@sr71.net>,
	Matt Helsley <matt.helsley@gmail.com>,
	Anthony Liguori <anthony@codemonkey.ws>,
	Michael Roth <mdroth@linux.vnet.ibm.com>,
	Lei Li <lilei@linux.vnet.ibm.com>,
	Leonardo Garcia <lagarcia@linux.vnet.ibm.com>
Subject: Re: [PATCH 1/2] vmsplice: unmap gifted pages for recipient
Date: Thu, 17 Oct 2013 12:20:30 +0200	[thread overview]
Message-ID: <525FB9EE.3070609@suse.cz> (raw)
In-Reply-To: <1381177293-27125-2-git-send-email-rcj@linux.vnet.ibm.com>

On 10/07/2013 10:21 PM, Robert C Jennings wrote:
> Introduce use of the unused SPLICE_F_MOVE flag for vmsplice to zap
> pages.
> 
> When vmsplice is called with flags (SPLICE_F_GIFT | SPLICE_F_MOVE) the
> writer's gift'ed pages would be zapped.  This patch supports further work
> to move vmsplice'd pages rather than copying them.  That patch has the
> restriction that the page must not be mapped by the source for the move,
> otherwise it will fall back to copying the page.
> 
> Signed-off-by: Matt Helsley <matt.helsley@gmail.com>
> Signed-off-by: Robert C Jennings <rcj@linux.vnet.ibm.com>
> ---
> Since the RFC went out I have coalesced the zap_page_range() call to
> operate on VMAs rather than calling this for each page.  For a 256MB
> vmsplice this reduced the write side 50% from the RFC.
> ---
>  fs/splice.c            | 51 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/splice.h |  1 +
>  2 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index 3b7ee65..a62d61e 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -188,12 +188,17 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>  {
>  	unsigned int spd_pages = spd->nr_pages;
>  	int ret, do_wakeup, page_nr;
> +	struct vm_area_struct *vma;
> +	unsigned long user_start, user_end;
>  
>  	ret = 0;
>  	do_wakeup = 0;
>  	page_nr = 0;
> +	vma = NULL;
> +	user_start = user_end = 0;
>  
>  	pipe_lock(pipe);
> +	down_read(&current->mm->mmap_sem);

Seems like you could take the mmap_sem only when GIFT and MOVE is set.
Maybe it won't help that much for performance but at least serve as
documenting the reason it's needed?

Vlastimil

>  	for (;;) {
>  		if (!pipe->readers) {
> @@ -212,8 +217,44 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>  			buf->len = spd->partial[page_nr].len;
>  			buf->private = spd->partial[page_nr].private;
>  			buf->ops = spd->ops;
> -			if (spd->flags & SPLICE_F_GIFT)
> +			if (spd->flags & SPLICE_F_GIFT) {
> +				unsigned long useraddr =
> +						spd->partial[page_nr].useraddr;
> +
> +				if ((spd->flags & SPLICE_F_MOVE) &&
> +						!buf->offset &&
> +						(buf->len == PAGE_SIZE)) {
> +					/* Can move page aligned buf, gather
> +					 * requests to make a single
> +					 * zap_page_range() call per VMA
> +					 */
> +					if (vma && (useraddr == user_end) &&
> +						   ((useraddr + PAGE_SIZE) <=
> +						    vma->vm_end)) {
> +						/* same vma, no holes */
> +						user_end += PAGE_SIZE;
> +					} else {
> +						if (vma)
> +							zap_page_range(vma,
> +								user_start,
> +								(user_end -
> +								 user_start),
> +								NULL);
> +						vma = find_vma_intersection(
> +								current->mm,
> +								useraddr,
> +								(useraddr +
> +								 PAGE_SIZE));
> +						if (!IS_ERR_OR_NULL(vma)) {
> +							user_start = useraddr;
> +							user_end = (useraddr +
> +								    PAGE_SIZE);
> +						} else
> +							vma = NULL;
> +					}
> +				}
>  				buf->flags |= PIPE_BUF_FLAG_GIFT;
> +			}
>  
>  			pipe->nrbufs++;
>  			page_nr++;
> @@ -255,6 +296,10 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>  		pipe->waiting_writers--;
>  	}
>  
> +	if (vma)
> +		zap_page_range(vma, user_start, (user_end - user_start), NULL);
> +
> +	up_read(&current->mm->mmap_sem);
>  	pipe_unlock(pipe);
>  
>  	if (do_wakeup)
> @@ -485,6 +530,7 @@ fill_it:
>  
>  		spd.partial[page_nr].offset = loff;
>  		spd.partial[page_nr].len = this_len;
> +		spd.partial[page_nr].useraddr = index << PAGE_CACHE_SHIFT;
>  		len -= this_len;
>  		loff = 0;
>  		spd.nr_pages++;
> @@ -656,6 +702,7 @@ ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
>  		this_len = min_t(size_t, vec[i].iov_len, res);
>  		spd.partial[i].offset = 0;
>  		spd.partial[i].len = this_len;
> +		spd.partial[i].useraddr = (unsigned long)vec[i].iov_base;
>  		if (!this_len) {
>  			__free_page(spd.pages[i]);
>  			spd.pages[i] = NULL;
> @@ -1475,6 +1522,8 @@ static int get_iovec_page_array(const struct iovec __user *iov,
>  
>  			partial[buffers].offset = off;
>  			partial[buffers].len = plen;
> +			partial[buffers].useraddr = (unsigned long)base;
> +			base = (void*)((unsigned long)base + PAGE_SIZE);
>  
>  			off = 0;
>  			len -= plen;
> diff --git a/include/linux/splice.h b/include/linux/splice.h
> index 74575cb..56661e3 100644
> --- a/include/linux/splice.h
> +++ b/include/linux/splice.h
> @@ -44,6 +44,7 @@ struct partial_page {
>  	unsigned int offset;
>  	unsigned int len;
>  	unsigned long private;
> +	unsigned long useraddr;
>  };
>  
>  /*
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Robert C Jennings <rcj@linux.vnet.ibm.com>, linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Rik van Riel <riel@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Hansen <dave@sr71.net>,
	Matt Helsley <matt.helsley@gmail.com>,
	Anthony Liguori <anthony@codemonkey.ws>,
	Michael Roth <mdroth@linux.vnet.ibm.com>,
	Lei Li <lilei@linux.vnet.ibm.com>,
	Leonardo Garcia <lagarcia@linux.vnet.ibm.com>
Subject: Re: [PATCH 1/2] vmsplice: unmap gifted pages for recipient
Date: Thu, 17 Oct 2013 12:20:30 +0200	[thread overview]
Message-ID: <525FB9EE.3070609@suse.cz> (raw)
In-Reply-To: <1381177293-27125-2-git-send-email-rcj@linux.vnet.ibm.com>

On 10/07/2013 10:21 PM, Robert C Jennings wrote:
> Introduce use of the unused SPLICE_F_MOVE flag for vmsplice to zap
> pages.
> 
> When vmsplice is called with flags (SPLICE_F_GIFT | SPLICE_F_MOVE) the
> writer's gift'ed pages would be zapped.  This patch supports further work
> to move vmsplice'd pages rather than copying them.  That patch has the
> restriction that the page must not be mapped by the source for the move,
> otherwise it will fall back to copying the page.
> 
> Signed-off-by: Matt Helsley <matt.helsley@gmail.com>
> Signed-off-by: Robert C Jennings <rcj@linux.vnet.ibm.com>
> ---
> Since the RFC went out I have coalesced the zap_page_range() call to
> operate on VMAs rather than calling this for each page.  For a 256MB
> vmsplice this reduced the write side 50% from the RFC.
> ---
>  fs/splice.c            | 51 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/splice.h |  1 +
>  2 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index 3b7ee65..a62d61e 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -188,12 +188,17 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>  {
>  	unsigned int spd_pages = spd->nr_pages;
>  	int ret, do_wakeup, page_nr;
> +	struct vm_area_struct *vma;
> +	unsigned long user_start, user_end;
>  
>  	ret = 0;
>  	do_wakeup = 0;
>  	page_nr = 0;
> +	vma = NULL;
> +	user_start = user_end = 0;
>  
>  	pipe_lock(pipe);
> +	down_read(&current->mm->mmap_sem);

Seems like you could take the mmap_sem only when GIFT and MOVE is set.
Maybe it won't help that much for performance but at least serve as
documenting the reason it's needed?

Vlastimil

>  	for (;;) {
>  		if (!pipe->readers) {
> @@ -212,8 +217,44 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>  			buf->len = spd->partial[page_nr].len;
>  			buf->private = spd->partial[page_nr].private;
>  			buf->ops = spd->ops;
> -			if (spd->flags & SPLICE_F_GIFT)
> +			if (spd->flags & SPLICE_F_GIFT) {
> +				unsigned long useraddr =
> +						spd->partial[page_nr].useraddr;
> +
> +				if ((spd->flags & SPLICE_F_MOVE) &&
> +						!buf->offset &&
> +						(buf->len == PAGE_SIZE)) {
> +					/* Can move page aligned buf, gather
> +					 * requests to make a single
> +					 * zap_page_range() call per VMA
> +					 */
> +					if (vma && (useraddr == user_end) &&
> +						   ((useraddr + PAGE_SIZE) <=
> +						    vma->vm_end)) {
> +						/* same vma, no holes */
> +						user_end += PAGE_SIZE;
> +					} else {
> +						if (vma)
> +							zap_page_range(vma,
> +								user_start,
> +								(user_end -
> +								 user_start),
> +								NULL);
> +						vma = find_vma_intersection(
> +								current->mm,
> +								useraddr,
> +								(useraddr +
> +								 PAGE_SIZE));
> +						if (!IS_ERR_OR_NULL(vma)) {
> +							user_start = useraddr;
> +							user_end = (useraddr +
> +								    PAGE_SIZE);
> +						} else
> +							vma = NULL;
> +					}
> +				}
>  				buf->flags |= PIPE_BUF_FLAG_GIFT;
> +			}
>  
>  			pipe->nrbufs++;
>  			page_nr++;
> @@ -255,6 +296,10 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>  		pipe->waiting_writers--;
>  	}
>  
> +	if (vma)
> +		zap_page_range(vma, user_start, (user_end - user_start), NULL);
> +
> +	up_read(&current->mm->mmap_sem);
>  	pipe_unlock(pipe);
>  
>  	if (do_wakeup)
> @@ -485,6 +530,7 @@ fill_it:
>  
>  		spd.partial[page_nr].offset = loff;
>  		spd.partial[page_nr].len = this_len;
> +		spd.partial[page_nr].useraddr = index << PAGE_CACHE_SHIFT;
>  		len -= this_len;
>  		loff = 0;
>  		spd.nr_pages++;
> @@ -656,6 +702,7 @@ ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
>  		this_len = min_t(size_t, vec[i].iov_len, res);
>  		spd.partial[i].offset = 0;
>  		spd.partial[i].len = this_len;
> +		spd.partial[i].useraddr = (unsigned long)vec[i].iov_base;
>  		if (!this_len) {
>  			__free_page(spd.pages[i]);
>  			spd.pages[i] = NULL;
> @@ -1475,6 +1522,8 @@ static int get_iovec_page_array(const struct iovec __user *iov,
>  
>  			partial[buffers].offset = off;
>  			partial[buffers].len = plen;
> +			partial[buffers].useraddr = (unsigned long)base;
> +			base = (void*)((unsigned long)base + PAGE_SIZE);
>  
>  			off = 0;
>  			len -= plen;
> diff --git a/include/linux/splice.h b/include/linux/splice.h
> index 74575cb..56661e3 100644
> --- a/include/linux/splice.h
> +++ b/include/linux/splice.h
> @@ -44,6 +44,7 @@ struct partial_page {
>  	unsigned int offset;
>  	unsigned int len;
>  	unsigned long private;
> +	unsigned long useraddr;
>  };
>  
>  /*
> 


  parent reply	other threads:[~2013-10-17 10:20 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-07 20:21 [PATCH 0/2] vmpslice support for zero-copy gifting of pages Robert C Jennings
2013-10-07 20:21 ` Robert C Jennings
2013-10-07 20:21 ` [PATCH 1/2] vmsplice: unmap gifted pages for recipient Robert C Jennings
2013-10-07 20:21   ` Robert C Jennings
2013-10-08 16:14   ` Dave Hansen
2013-10-08 16:14     ` Dave Hansen
2013-10-08 19:48     ` Robert Jennings
2013-10-08 19:48       ` Robert Jennings
2013-10-08 21:22       ` Dave Hansen
2013-10-08 21:22         ` Dave Hansen
2013-10-08 16:23   ` Dave Hansen
2013-10-08 16:23     ` Dave Hansen
2013-10-17 13:54     ` Robert Jennings
2013-10-17 13:54       ` Robert Jennings
2013-10-17 10:20   ` Vlastimil Babka [this message]
2013-10-17 10:20     ` Vlastimil Babka
2013-10-17 13:48     ` Robert Jennings
2013-10-17 13:48       ` Robert Jennings
2013-10-18  8:10       ` Vlastimil Babka
2013-10-18  8:10         ` Vlastimil Babka
2013-10-07 20:21 ` [PATCH 2/2] vmsplice: Add limited zero copy to vmsplice Robert C Jennings
2013-10-07 20:21   ` Robert C Jennings
2013-10-08 16:45   ` Dave Hansen
2013-10-08 16:45     ` Dave Hansen
2013-10-08 17:35     ` Robert Jennings
2013-10-08 17:35       ` Robert Jennings
2013-10-17 11:23   ` Vlastimil Babka
2013-10-17 11:23     ` Vlastimil Babka
2013-10-17 13:44     ` Robert Jennings
2013-10-17 13:44       ` Robert Jennings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=525FB9EE.3070609@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=dave@sr71.net \
    --cc=lagarcia@linux.vnet.ibm.com \
    --cc=lilei@linux.vnet.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matt.helsley@gmail.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=rcj@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.