Re: [PATCH][RFC] network splice receive v2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <jens.axboe@oracle.com>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: netdev@vger.kernel.org, olaf.kirch@oracle.com
Subject: Re: [PATCH][RFC] network splice receive v2
Date: Fri, 15 Jun 2007 10:43:18 +0200	[thread overview]
Message-ID: <20070615084318.GP6149@kernel.dk> (raw)
In-Reply-To: <20070615080618.GA22234@2ka.mipt.ru>

On Fri, Jun 15 2007, Evgeniy Polyakov wrote:
> On Thu, Jun 14, 2007 at 01:59:02PM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> > On Thu, Jun 14 2007, Evgeniy Polyakov wrote:
> > > On Wed, Jun 13, 2007 at 12:01:04PM +0400, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> > > > I will rebase my tree, likely something was not merged correctly.
> > > 
> > > Ok, I've just rebased a tree from recent git and pulled from brick -
> > > things seems to be settled. I've ran several tests with different
> > > filesizes and all files were received correctly without kernel crashes.
> > > There is skb leak indeed, and it looks like it is in the
> > > __splice_from_pipe() for the last page.
> > 
> > Uh, the leak, right - I had forgotten about that, was working on getting
> > vmsplice into shape the last two days. Interesting that you mention the
> > last page, I'll dig in now! Any more info on this (how did you notice
> > the leak originates from there)?
> 
> I first observed leak via slabinfo data (not a leak, but number of skbs
> did not dropped after quite huge files were transferred), then added
> number of allocated and freed objects in skbuff.c, they did not match
> for big files, so I started to check splice source and found that
> sometimes ->release callback is not called, but code breaks out of the
> loop. I've put some printks in __splice_from_pipe() and found following
> case, when skb is leaked:
> when the same cloned skb was shared multiple times (no more than 2 though),
> only one copy was freed.
> 
> Further analysis description requires some splice background (obvious
> for you, but that clears it for me):
> pipe_buffer only contains 16 pages.
> There is a code, which copies pages (pointers) from spd to pipe_buffer
> (splice_to_pipe()).
> skb allocations happens in chunks of different size (i.e. with different
> number of skbs/pages per call), so it is possible that number of
> allocated skbs will be less than pipe_buffer size (16), and then the
> rest of the pages will be put into different (or the same) pipe_buffer later.
> Sometimes two skbs from above example happens to be on the boundary of
> the pipe buffer, so only one of them is being copied into pipe_buffer,
> which is then transferred over the pipe.
> So, we have a case, when spd has (it had more, but this two are special) 
> 2 pages (actually the same page, but two references to it), but pipe_buffer 
> has a place only for one. In that case second page from spd will be missed.
> 
> So, things turned down to be not in the __splice_from_pipe(), but
> splice_to_pipe(). Attached patch fixes a leak for me.
> It was tested with different data files and all were received correctly.
> 
> Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index bc481f1..365bfd9 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>  				break;
>  			if (pipe->nrbufs < PIPE_BUFFERS)
>  				continue;
> -
> -			break;
>  		}
>  
>  		if (spd->flags & SPLICE_F_NONBLOCK) {
> 

Hmm, curious. If we hit that location, then two conditions are true:

- Pipe is full
- We transferred some data

if you remove the break, then you'll end up blocking in pipe_wait()
(unless you have SPLICE_F_NONBLOCK also set). And we don't want to block
waiting for more room, if we already transferred some data. In that case
we just want to return a short splice. Looking at pipe_write(), it'll
block as well though. Just doesn't seem optimal to me, but...

So the question is why would doing the break there cause a leak? I just
don't yet see how it can happen, I'd love to fix that condition instead.
For the case you describe, we should have page_nr == 1 and spd->nr_pages
== 2. Is the:

        while (page_nr < spd->nr_pages)
                spd->spd_release(spd, page_nr++);

not dropping the right reference?

-- 
Jens Axboe

next prev parent reply	other threads:[~2007-06-15  8:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-11 11:59 [PATCH][RFC] network splice receive v2 Jens Axboe
2007-06-12 18:17 ` Evgeniy Polyakov
2007-06-12 18:17   ` Jens Axboe
2007-06-12 18:18     ` Jens Axboe
2007-06-13  8:01     ` Evgeniy Polyakov
2007-06-13 19:50       ` Jens Axboe
2007-06-14 11:57       ` Evgeniy Polyakov
2007-06-14 11:59         ` Jens Axboe
2007-06-15  8:06           ` Evgeniy Polyakov
2007-06-15  8:43             ` Jens Axboe [this message]
2007-06-15  9:14               ` Evgeniy Polyakov
2007-06-15  9:34                 ` Jens Axboe
2007-06-15  9:44                   ` Evgeniy Polyakov
2007-06-15 13:31                     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070615084318.GP6149@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=johnpol@2ka.mipt.ru \
    --cc=netdev@vger.kernel.org \
    --cc=olaf.kirch@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.