All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: David Miller <davem@davemloft.net>, Jens Axboe <axboe@kernel.dk>,
	netdev@vger.kernel.org, Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [take 2] Fix for the fundamental network/block layer race in sendfile().
Date: Tue, 08 Apr 2008 14:58:43 +0200	[thread overview]
Message-ID: <47FB6C03.3070708@cosmosbay.com> (raw)
In-Reply-To: <20080408122545.GA31729@2ka.mipt.ru>

Evgeniy Polyakov a écrit :
> Hi.
>
> Summary of the previous series with this pompous header:
> when sendfile() returns, pages which it sent can still be queued in tcp
> stack or hardware, so subsequent write into them will endup in
> corrupting data which will be eventually sent. This concerns all
> ->sendpage() users namely sendfile() and splice().
>
> With this patch it is now guaranteed that data was transferred over the
> wire after splice/sendfile returns.
>
>   
1) So a nonblocking operation (O_NDELAY) can become a blocking one now ?

> This approach (besides the fact that previous one was not 100% correct
> dealing with cloned skbs) uses new destructor field in shared info
> structure (introduced by Rusty, afacs they do not collide), which is
> invoked each time data is about to be released (but before actual
> release of data happens).
>   
> Patch works by assigning each page from the sendfile path a couple of
>
>   
> pointers: to page itself to differentiate sendfile pages from all
> others and pointer to pipe structure, which wakes up splice code.
> It is safe to assign the same callback for sendfile and non-sendfile
> pages because of above page pointer - pages which do not have it (slab,
> non-sendfile bio layer and others) will not have pipe structure
> dereferenced from private pointers.
>
> P.S. Previous patch was not an April 1 joke as long as this one :)
>
>   
Hum, I see you forgot me in CC list, dont try to escape :)
> Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 49a9871..1961a46 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -16,6 +16,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/pagemap.h>
>  #include <linux/splice.h>
> +#include <net/sock.h>
>  #include "read_write.h"
>  
>  #include <asm/uaccess.h>
> @@ -695,6 +696,24 @@ sys_writev(unsigned long fd, const struct iovec __user *vec, unsigned long vlen)
>  	return ret;
>  }
>  
> +static void skb_splice_destructor(struct skb_shared_info *shi)
> +{
> +	if (shi->nr_frags) {
> +		int i;
> +		struct pipe_inode_info *pipe;
> +
> +		for (i = 0; i < shi->nr_frags; i++) {
> +			struct page *page = shi->frags[i].page;
> +
> +			if (page->lru.prev == (struct list_head *)page) {
> +				pipe = (struct pipe_inode_info *)page->lru.next;
>   
    Unless I mistaken, you store in page->lru.next some info to find 
your pipe pointer, assuming it is unique for this page.

What happens if two threads are doing a splice()/sendfile() using the 
same underlying (source) file (and same pages in this file)


> +				if (atomic_dec_return(&pipe->god_blessed_us) == 0)
> +					wake_up(&pipe->wait);
> +			}
> +		}
> +	}
> +}
> +
>  static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
>  			   size_t count, loff_t max)
>  {
> @@ -703,6 +722,8 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
>  	loff_t pos;
>  	ssize_t retval;
>  	int fput_needed_in, fput_needed_out, fl;
> +	struct sock *sk;
> +	struct socket *sock;
>  
>  	/*
>  	 * Get input file, and verify that it is ok..
> @@ -762,6 +783,12 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
>  		count = max - pos;
>  	}
>  
> +	sock = out_file->private_data;
> +	sk = sock->sk;
> +
> +	sk->sk_user_data = &skb_splice_destructor;
> +	set_bit(SOCK_PRIVATE_CALLBACK, &sock->flags);
> +
>  	fl = 0;
>  #if 0
>  	/*
> @@ -775,6 +802,8 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
>  #endif
>  	retval = do_splice_direct(in_file, ppos, out_file, count, fl);
>  
> +	clear_bit(SOCK_PRIVATE_CALLBACK, &sock->flags);
> +
>  	if (retval > 0) {
>  		add_rchar(current, retval);
>  		add_wchar(current, retval);
> diff --git a/fs/splice.c b/fs/splice.c
> index 0670c91..dcda32e 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -29,6 +29,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/uio.h>
>  #include <linux/security.h>
> +#include <net/sock.h>
>  
>  /*
>   * Attempt to steal a page from a pipe buffer. This should perhaps go into
> @@ -535,6 +536,8 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe,
>  	if (!ret) {
>  		more = (sd->flags & SPLICE_F_MORE) || sd->len < sd->total_len;
>  
> +		buf->page->lru.next = (void *)pipe;
>   
is it really allowed here, are you the only user ot this page ?
> +		buf->page->lru.prev = (void *)buf->page;
>  		ret = file->f_op->sendpage(file, buf->page, buf->offset,
>  					   sd->len, &pos, more);
>  	}
> @@ -629,6 +632,9 @@ ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd,
>  	ret = 0;
>  	do_wakeup = 0;
>   





  reply	other threads:[~2008-04-08 12:58 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-28  9:20 Network/block layer race Evgeniy Polyakov
2008-03-28 20:40 ` David Miller
2008-03-28 20:56   ` Evgeniy Polyakov
2008-03-28 21:07     ` David Miller
2008-03-28 21:51       ` Evgeniy Polyakov
2008-04-01 16:49         ` Fix for the fundamental network/block layer race in sendfile() Evgeniy Polyakov
2008-04-01 17:14           ` Mika Penttilä
2008-04-01 17:36             ` Evgeniy Polyakov
2008-04-01 17:19           ` Eric Dumazet
2008-04-01 17:47             ` Evgeniy Polyakov
2008-04-01 18:07               ` Evgeniy Polyakov
2008-04-01 19:21                 ` Eric Dumazet
2008-04-01 19:45                   ` Evgeniy Polyakov
2008-04-01 20:59                     ` Eric Dumazet
2008-04-01 21:14                       ` Evgeniy Polyakov
2008-04-08 12:25           ` [take 2] " Evgeniy Polyakov
2008-04-08 12:58             ` Eric Dumazet [this message]
2008-04-08 17:26               ` Evgeniy Polyakov
2008-04-08 21:30                 ` Evgeniy Polyakov
2008-04-09 11:33                   ` Jens Axboe
2011-06-06 16:29             ` IPv6 DNSSL (rfc6106): please include the patch to pass it to user space Carlos Carvalho
2011-06-06 19:40               ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47FB6C03.3070708@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=johnpol@2ka.mipt.ru \
    --cc=netdev@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.