From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: zero copy for relay server Date: Tue, 29 Mar 2011 06:23:10 +0200 Message-ID: <1301372590.2506.57.camel@edumazet-laptop> References: <1301331138.3182.43.camel@edumazet-laptop> <1301337257.2506.8.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Viral Mehta , "netdev@vger.kernel.org" To: Changli Gao Return-path: Received: from mail-ww0-f44.google.com ([74.125.82.44]:36773 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750774Ab1C2EXP (ORCPT ); Tue, 29 Mar 2011 00:23:15 -0400 Received: by wwa36 with SMTP id 36so4613909wwa.1 for ; Mon, 28 Mar 2011 21:23:13 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Le mardi 29 mars 2011 =C3=A0 10:00 +0800, Changli Gao a =C3=A9crit : > On Tue, Mar 29, 2011 at 2:34 AM, Eric Dumazet wrote: > I think he concerns the overhead of system calls. In order to omit a > system call, I think you can implement sth. like this: >=20 > splice2(infd, outfd, pipefd, ...) >=20 Yes, but given no numbers are given, and no code yet written, I ask the question. Giving 4 file descriptors to a single syscall sounds convoluted. > What you need do is maintaining pipes by yourself. >=20 > >> 2. I believe underlying PIPE that we are using will also have some= size limit > >> (like in user space 4K or 64K, not sure) > > > > What kind of socket is able to deliver more than 64K frames ? >=20 > You can enlarge the size with fcntl(pipefd, F_SETPIPE_SZ,...). >=20 Not really useful, since splice() internals use automatic arrays sized with PIPE_DEF_BUFFERS. You can enlarge the size of pipe, but still we are limited to at most 64K in skb_splice_bits() for example [On x86 and its 4KB pages] This doesnt matter, since skb are limited to 16 pages anyway (or 64Kb) =46_SETPIPE_SZ only can increase size of pipe ringbuffer (which should = be empty or contain at most one skb), therefore increasing dcache needs. =20 > > > > sendfile() is based on top of splice(), but it's faster to use spli= ce(). > > > > >=20 > Why? Thanks. >=20 The real cost is not syscall overhead, but context switches and cache misses. Adding a "super syscall" adds kernel text and increases icache misses on real machine (I am not talking about machine used in micro benchmarks) Most likely, GRO can significantly speed this workload, while a syscall avoidance wont.