Re: using splice/vmsplice to improve file receive performance

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <jens.axboe@oracle.com>
To: saeed bishara <saeed.bishara@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: using splice/vmsplice to improve file receive performance
Date: Thu, 4 Jan 2007 15:08:13 +0100	[thread overview]
Message-ID: <20070104140813.GZ11203@kernel.dk> (raw)
In-Reply-To: <c70ff3ad0701031209g43cf5576v11b6409696af1ed4@mail.gmail.com>

On Wed, Jan 03 2007, saeed bishara wrote:
> On 12/22/06, Jens Axboe <jens.axboe@oracle.com> wrote:
> >On Fri, Dec 22 2006, saeed bishara wrote:
> >> On 12/22/06, Jens Axboe <jens.axboe@oracle.com> wrote:
> >> >On Fri, Dec 22 2006, saeed bishara wrote:
> >> >> On 12/22/06, Jens Axboe <jens.axboe@oracle.com> wrote:
> >> >> >On Thu, Dec 21 2006, saeed bishara wrote:
> >> >> >> Hi,
> >> >> >> I'm trying to use the splice/vmsplice system calls to improve the
> >> >> >> samba server write throughput, but before touching the smbd, I 
> >started
> >> >> >> to improve the ttcp tool since it simple and has the same flow. I'm
> >> >> >> expecting to avoid the "copy_from_user" path when using those
> >> >> >> syscalls.
> >> >> >> so far, I couldn't make any improvement, actually the throughput 
> >get
> >> >> >> worst. the new receive flow looks like this (code also attached):
> >> >> >> 1. read tcp packet (64 pages) to page aligned buffer.
> >> >> >> 2. vmsplice the buffer to pipe with SPLICE_F_MOVE.
> >> >> >> 3. splice the pipe to the file, also with SPLICE_F_MOVE.
> >> >> >>
> >> >> >> the strace shows that the splice takes a lot of time. also when
> >> >> >> profiling the kernel, I found that the memcpy() called to often !!
> >> >> >
> >> >> >(didn't see this until now, axboe@suse.de doesn't work anymore)
> >> >> >
> >> >> >I'm assuming that you mean you vmsplice with SPLICE_F_GIFT, to hand
> >> >> >ownership of the pages to the kernel (in which case SPLICE_F_MOVE 
> >will
> >> >> >work, otherwise you get a copy)? If not, that'll surely cost you a 
> >data
> >> >> >copy
> >> >>   I'll try the vmplice with SPLICE_F_GIFT and splice with MOVE. btw,
> >> >> I noticed that the  splice system call takes the bulk of the time,
> >> >> does it mean anything?
> >> >
> >> >Hard to say without seeing some numbers :-)
> >> I'm out of the office, I'll send it later. btw, my test bed ( the
> >> receiver side ) is arm9. does it matter?
> >
> >The vmsplice is basically vm intensive, so it could matter.
> >
> >> >> >This sounds remarkably like a recent thread on lkml, you may want to
> >> >> >read up on that. Basically using splice for network receive is a bit 
> >of
> >> >> >a work-around now, since you do need the one copy and then vmsplice 
> >that
> >> >> >into a pipe. To realize the full potential of splice, we first need
> >> >> >socket receive support so you can skip that step (splice from socket 
> >to
> >> >> >pipe, splice pipe to file).
> >> >> Ashwini Kulkarni posted patches that implements that, see
> >> >> http://lkml.org/lkml/2006/9/20/272 .  is that right?
> >> >> >
> >> >> >There was no test code attached, btw.
> >> >> sorry, here it is.
> >> >> can you please add sample application to your test tools (splice,fio
> >> >> ,,) that demonstrates my flow; socket to file using read & vmsplice?
> >> >
> >> >I didn't add such an example, since I had hoped that we would have
> >> >splice from socket support sooner rather than later. But I can do so, of
> >> >course.
> >> do you any preliminary patches? I can start playing with it.
> >
> >I don't, Intel posted a set of patches a few months ago though. I didn't
> >have time to look that at the time being, but you should be able to find
> >them in the archives.
> >
> >> >I'll try your test. One thing that sticks out initially is that you
> >> >should be using full pages, the splice pipe will not merge page
> >> >segments. So don't use a buflen less than the page size.
> >>
> >> yes, actually I  run the ttcp with -l65536 ( 64KB ), and the buffer is
> >> always page aligned.also, the splice/vmsplice with MOVE or GIFT will
> >> fail if the buffer is not a whole pages. am I rigth?
> >
> >Yes.
> >
> >I added a simple splice-fromnet example in the splice git repo, see if
> >you can repeat your results with that. Doing:
> >
> ># ./splice-fromnet -g 2001 | ./splice-out -m /dev/null
> >
> >and
> >
> ># cat /dev/zero | netcat localhost 2001
> >
> >gets me about 490MiB/sec, using a recv/write loop is around 413MiB/sec.
> >Not migrating pages gets me around 422MiB/sec.
> >
> >--
> >Jens Axboe
> >
> >
> I've done some investigation in the splice flow and found the following:
> even when using vmsplice with GIFT and splice with MOVE, the user
> buffers still copied, I see that the memcpy from pipe_to_file() is
> called.
> I added debug messages in this function and here what I got:
> 1. the  generic_pipe_buf_steal always fails, this is because the
> page_count is 2.
> 2. after then, the find_lock_page fails as well.
> 3. page_cache_alloc_cold succeeds.
> 4. but, since the buf->page is differs from the page (returned by
> page_cache_alloc_cold) the memcpy function is called.
> 
> this behavior true for all the buffers that vmspliced to ext3 file.
> is this the expected behavior? is there any way to make the steal
> operation return with success?

It works for me, with most pages. Using the vmsplice/splice-out from the
splice tools, doing

$ ./vmsplice -g |  ./splice-out -m g

about half of the pages have count==1 and the steal suceeds.

find_lock_page() will only suceed, if the file exists and is cached
already. splice-out will truncate the file, so it should never suceed
for that case. For both the find_lock_page() success and failure case
(page being allocated), it's a given that we need to copy the data.

-- 
Jens Axboe

next prev parent reply	other threads:[~2007-01-04 14:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-21 19:21 using splice/vmsplice to improve file receive performance saeed bishara
2006-12-22  9:48 ` Jens Axboe
2006-12-22 11:18   ` saeed bishara
2006-12-22 11:39     ` Jens Axboe
2006-12-22 11:59       ` saeed bishara
2006-12-22 12:47         ` Jens Axboe
2007-01-03 20:09           ` saeed bishara
2007-01-04 14:08             ` Jens Axboe [this message]
2007-01-04 14:16               ` Jens Axboe
2007-01-04 16:27                 ` saeed bishara
2007-01-04 17:38                   ` saeed bishara
2007-01-07 18:16 ` saeed bishara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070104140813.GZ11203@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=saeed.bishara@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.