* UDP splice @ 2013-06-21 10:04 Ricardo Landim 2013-06-24 15:42 ` Ben Hutchings 0 siblings, 1 reply; 13+ messages in thread From: Ricardo Landim @ 2013-06-21 10:04 UTC (permalink / raw) To: netdev Hi folks, I am developing a RTP proxy for voip applications and tried use the splice syscall for zero copy. I am trying splice udp data to pipe and splice pipe to udp socket. I read some information of the splice function and reading the kernel source code I saw this lines in net/ipv4/af_inet.c const struct proto_ops inet_stream_ops = { ... .splice_read = tcp_splice_read, ... } const struct proto_ops inet_dgram_ops = { ... ... } There is an implementation of splice for TCP socket but not for UDP socket. My question is: there is some limitation in UDP socket that prevents this implementation? Regards, Ricardo Landim ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-21 10:04 UDP splice Ricardo Landim @ 2013-06-24 15:42 ` Ben Hutchings 2013-06-24 15:51 ` Hannes Frederic Sowa 0 siblings, 1 reply; 13+ messages in thread From: Ben Hutchings @ 2013-06-24 15:42 UTC (permalink / raw) To: Ricardo Landim; +Cc: netdev On Fri, 2013-06-21 at 07:04 -0300, Ricardo Landim wrote: > Hi folks, > > I am developing a RTP proxy for voip applications and tried use the > splice syscall > for zero copy. I am trying splice udp data to pipe and splice pipe to udp > socket. > > I read some information of the splice function and reading the kernel > source code I saw this lines in net/ipv4/af_inet.c > > const struct proto_ops inet_stream_ops = { > ... > .splice_read = tcp_splice_read, > ... > } > > const struct proto_ops inet_dgram_ops = { > ... > ... > } > > There is an implementation of splice for TCP socket but not for UDP socket. > > My question is: there is some limitation in UDP socket that prevents this > implementation? splice() works with streams, but UDP is a message-oriented protocol. How would a UDP implementation of splice() decide where to put the message boundaries, or to distinguish the messages? Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-24 15:42 ` Ben Hutchings @ 2013-06-24 15:51 ` Hannes Frederic Sowa 2013-06-24 16:02 ` Ben Hutchings 0 siblings, 1 reply; 13+ messages in thread From: Hannes Frederic Sowa @ 2013-06-24 15:51 UTC (permalink / raw) To: Ben Hutchings; +Cc: Ricardo Landim, netdev On Mon, Jun 24, 2013 at 04:42:34PM +0100, Ben Hutchings wrote: > On Fri, 2013-06-21 at 07:04 -0300, Ricardo Landim wrote: > > Hi folks, > > > > I am developing a RTP proxy for voip applications and tried use the > > splice syscall > > for zero copy. I am trying splice udp data to pipe and splice pipe to udp > > socket. > > > > I read some information of the splice function and reading the kernel > > source code I saw this lines in net/ipv4/af_inet.c > > > > const struct proto_ops inet_stream_ops = { > > ... > > .splice_read = tcp_splice_read, > > ... > > } > > > > const struct proto_ops inet_dgram_ops = { > > ... > > ... > > } > > > > There is an implementation of splice for TCP socket but not for UDP socket. > > > > My question is: there is some limitation in UDP socket that prevents this > > implementation? > > splice() works with streams, but UDP is a message-oriented protocol. > How would a UDP implementation of splice() decide where to put the > message boundaries, or to distinguish the messages? Splicing a pipe to udp socket should work nontheless. Splicing from udp socket to pipe does not work. Greetings, Hannes ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-24 15:51 ` Hannes Frederic Sowa @ 2013-06-24 16:02 ` Ben Hutchings 2013-06-24 17:01 ` Hannes Frederic Sowa 0 siblings, 1 reply; 13+ messages in thread From: Ben Hutchings @ 2013-06-24 16:02 UTC (permalink / raw) To: Hannes Frederic Sowa; +Cc: Ricardo Landim, netdev On Mon, 2013-06-24 at 17:51 +0200, Hannes Frederic Sowa wrote: > On Mon, Jun 24, 2013 at 04:42:34PM +0100, Ben Hutchings wrote: > > On Fri, 2013-06-21 at 07:04 -0300, Ricardo Landim wrote: > > > Hi folks, > > > > > > I am developing a RTP proxy for voip applications and tried use the > > > splice syscall > > > for zero copy. I am trying splice udp data to pipe and splice pipe to udp > > > socket. > > > > > > I read some information of the splice function and reading the kernel > > > source code I saw this lines in net/ipv4/af_inet.c > > > > > > const struct proto_ops inet_stream_ops = { > > > ... > > > .splice_read = tcp_splice_read, > > > ... > > > } > > > > > > const struct proto_ops inet_dgram_ops = { > > > ... > > > ... > > > } > > > > > > There is an implementation of splice for TCP socket but not for UDP socket. > > > > > > My question is: there is some limitation in UDP socket that prevents this > > > implementation? > > > > splice() works with streams, but UDP is a message-oriented protocol. > > How would a UDP implementation of splice() decide where to put the > > message boundaries, or to distinguish the messages? > > Splicing a pipe to udp socket should work nontheless. Splicing from udp socket > to pipe does not work. Should it? I suppose it could work for UDP-based protocols that include their own framing, but I can't see that it's generally useful. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-24 16:02 ` Ben Hutchings @ 2013-06-24 17:01 ` Hannes Frederic Sowa 2013-06-24 17:09 ` Ricardo Landim 0 siblings, 1 reply; 13+ messages in thread From: Hannes Frederic Sowa @ 2013-06-24 17:01 UTC (permalink / raw) To: Ben Hutchings; +Cc: Ricardo Landim, netdev On Mon, Jun 24, 2013 at 05:02:56PM +0100, Ben Hutchings wrote: > On Mon, 2013-06-24 at 17:51 +0200, Hannes Frederic Sowa wrote: > > On Mon, Jun 24, 2013 at 04:42:34PM +0100, Ben Hutchings wrote: > > > splice() works with streams, but UDP is a message-oriented protocol. > > > How would a UDP implementation of splice() decide where to put the > > > message boundaries, or to distinguish the messages? > > > > Splicing a pipe to udp socket should work nontheless. Splicing from udp socket > > to pipe does not work. > > Should it? I suppose it could work for UDP-based protocols that include > their own framing, but I can't see that it's generally useful. It is the same with sendfile: One splice-call would result in one udp message and could span several ip fragments. So, message boundaries would be maintained. splice is good for large data but the generation of fragments defeat its use here. I don't know what would be most suitable to use for copying udp fragments from one socket to another socket. I would have tried recvmmsg/sendmmsg. Greetings, Hannes ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-24 17:01 ` Hannes Frederic Sowa @ 2013-06-24 17:09 ` Ricardo Landim 2013-06-24 17:53 ` Eric Dumazet 2013-06-24 18:38 ` David Miller 0 siblings, 2 replies; 13+ messages in thread From: Ricardo Landim @ 2013-06-24 17:09 UTC (permalink / raw) To: Ben Hutchings, Ricardo Landim, netdev Useful for UDP proxy, mainly for RTP payload including voice (voip) and video live stream... 2013/6/24 Hannes Frederic Sowa <hannes@stressinduktion.org>: > On Mon, Jun 24, 2013 at 05:02:56PM +0100, Ben Hutchings wrote: >> On Mon, 2013-06-24 at 17:51 +0200, Hannes Frederic Sowa wrote: >> > On Mon, Jun 24, 2013 at 04:42:34PM +0100, Ben Hutchings wrote: >> > > splice() works with streams, but UDP is a message-oriented protocol. >> > > How would a UDP implementation of splice() decide where to put the >> > > message boundaries, or to distinguish the messages? >> > >> > Splicing a pipe to udp socket should work nontheless. Splicing from udp socket >> > to pipe does not work. >> >> Should it? I suppose it could work for UDP-based protocols that include >> their own framing, but I can't see that it's generally useful. > > It is the same with sendfile: One splice-call would result in one udp > message and could span several ip fragments. So, message boundaries > would be maintained. splice is good for large data but the generation of > fragments defeat its use here. > > I don't know what would be most suitable to use for copying udp fragments > from one socket to another socket. I would have tried recvmmsg/sendmmsg. > > Greetings, > > Hannes > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-24 17:09 ` Ricardo Landim @ 2013-06-24 17:53 ` Eric Dumazet 2013-06-24 18:08 ` Ricardo Landim 2013-06-24 18:38 ` David Miller 1 sibling, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2013-06-24 17:53 UTC (permalink / raw) To: Ricardo Landim; +Cc: Ben Hutchings, netdev On Mon, 2013-06-24 at 14:09 -0300, Ricardo Landim wrote: > Useful for UDP proxy, mainly for RTP payload including voice (voip) > and video live stream... I really doubt splice could help voip streams, they usually send one small frame every 20 ms. Implementing splice() for such small frame would be not really useful. For video, I would rather use a kernel helper (netfilter/conntrack ?) to avoid userland switch... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-24 17:53 ` Eric Dumazet @ 2013-06-24 18:08 ` Ricardo Landim 2013-06-24 18:33 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Ricardo Landim @ 2013-06-24 18:08 UTC (permalink / raw) To: Eric Dumazet; +Cc: Ben Hutchings, netdev Help in zero copy and improve in cost of syscalls. In my intel xeon(3.3ghz), read udp socket and write udp socket (proxy) spends ~40000 cycles (~12 us). 2013/6/24 Eric Dumazet <eric.dumazet@gmail.com>: > On Mon, 2013-06-24 at 14:09 -0300, Ricardo Landim wrote: >> Useful for UDP proxy, mainly for RTP payload including voice (voip) >> and video live stream... > > > I really doubt splice could help voip streams, they usually send one > small frame every 20 ms. Implementing splice() for such small frame > would be not really useful. > > For video, I would rather use a kernel helper (netfilter/conntrack ?) to > avoid userland switch... > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-24 18:08 ` Ricardo Landim @ 2013-06-24 18:33 ` Eric Dumazet 2013-06-24 18:39 ` David Miller 2013-06-24 21:33 ` Rick Jones 2 siblings, 0 replies; 13+ messages in thread From: Eric Dumazet @ 2013-06-24 18:33 UTC (permalink / raw) To: Ricardo Landim; +Cc: Ben Hutchings, netdev On Mon, 2013-06-24 at 15:08 -0300, Ricardo Landim wrote: > Help in zero copy and improve in cost of syscalls. > > In my intel xeon(3.3ghz), read udp socket and write udp socket (proxy) > spends ~40000 cycles (~12 us). > splice() wont reduce number of syscalls. If payload is small, copying it is less than 1000 cycles, so its a small part of the cost you measured. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-24 18:08 ` Ricardo Landim 2013-06-24 18:33 ` Eric Dumazet @ 2013-06-24 18:39 ` David Miller 2013-06-24 21:33 ` Rick Jones 2 siblings, 0 replies; 13+ messages in thread From: David Miller @ 2013-06-24 18:39 UTC (permalink / raw) To: ricardolan; +Cc: eric.dumazet, bhutchings, netdev From: Ricardo Landim <ricardolan@gmail.com> Date: Mon, 24 Jun 2013 15:08:25 -0300 > Help in zero copy and improve in cost of syscalls. > > In my intel xeon(3.3ghz), read udp socket and write udp socket (proxy) > spends ~40000 cycles (~12 us). Stop top posting, please. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-24 18:08 ` Ricardo Landim 2013-06-24 18:33 ` Eric Dumazet 2013-06-24 18:39 ` David Miller @ 2013-06-24 21:33 ` Rick Jones 2013-06-25 0:04 ` Ricardo Landim 2 siblings, 1 reply; 13+ messages in thread From: Rick Jones @ 2013-06-24 21:33 UTC (permalink / raw) To: Ricardo Landim; +Cc: Eric Dumazet, Ben Hutchings, netdev On 06/24/2013 11:08 AM, Ricardo Landim wrote: > Help in zero copy and improve in cost of syscalls. > > In my intel xeon(3.3ghz), read udp socket and write udp socket (proxy) > spends ~40000 cycles (~12 us). Are you quite certain your Xeon was actually running at 3.3GHz at the time? I just did a quick netperf UDP_RR test between an old Centrino-based laptop (HP 8510w) pegged at 1.6 GHz (cpufreq-set) and it was reporting a service demand of 12.2 microseconds per transaction, which is, basically, a send and recv pair plus stack: root@raj-8510w:~# netperf -t UDP_RR -c -i 30,3 -H tardy.usa.hp.com -- -r 140,1MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 () port 0 AF_INET to tardy.usa.hp.com () port 0 AF_INET : +/-2.500% @ 99% conf. : demo : first burst 0 !!! WARNING !!! Desired confidence was not achieved within the specified iterations. !!! This implies that there was variability in the test environment that !!! must be investigated before going further. !!! Confidence intervals: Throughput : 1.120% !!! Local CPU util : 6.527% !!! Remote CPU util : 0.000% Local /Remote Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem Send Recv Size Size Time Rate local remote local remote bytes bytes bytes bytes secs. per sec % S % U us/Tr us/Tr 180224 180224 140 1 10.00 12985.58 7.93 -1.00 12.221 -1.000 212992 212992 (Don't fret too much about the confidence intervals bit, it almost made it.) Also, my 1400 byte test didn't have all that different a service demand: root@raj-8510w:~# netperf -t UDP_RR -c -i 30,3 -H tardy.usa.hp.com -- -r 1400,1 MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 () port 0 AF_INET to tardy.usa.hp.com () port 0 AF_INET : +/-2.500% @ 99% conf. : demo : first burst 0 !!! WARNING !!! Desired confidence was not achieved within the specified iterations. !!! This implies that there was variability in the test environment that !!! must be investigated before going further. !!! Confidence intervals: Throughput : 1.123% !!! Local CPU util : 6.991% !!! Remote CPU util : 0.000% Local /Remote Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem Send Recv Size Size Time Rate local remote local remote bytes bytes bytes bytes secs. per sec % S % U us/Tr us/Tr 180224 180224 1400 1 10.00 10055.33 6.27 -1.00 12.469 -1.000 212992 212992 Of course I didn't try very hard to force cache misses (eg using a big send/recv ring) and there may have been other things happening on the system causing a change between the two tests (separated by an hour or so). I didn't make sure that interrupts stayed assigned to a specific CPU, nor that netperf did. The kernel: root@raj-8510w:~# uname -a Linux raj-8510w 3.8.0-25-generic #37-Ubuntu SMP Thu Jun 6 20:47:30 UTC 2013 i686 i686 i686 GNU/Linux In general, I suppose if you want to quantify the overhead of copies, you can try something like the two tests above, but for longer run times and with more intermediate data points, as you walk the request or response size up. Watch the change in service demand as you go. So long as you stay below 1472 bytes (assuming IPv4 over a "standard" 1500 byte MTU Ethernet) you won't generate fragments, and so will still have the same number of packets per transaction. Or you could "perf" profile and look for copy routines. happy benchmarking, rick jones ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-24 21:33 ` Rick Jones @ 2013-06-25 0:04 ` Ricardo Landim 0 siblings, 0 replies; 13+ messages in thread From: Ricardo Landim @ 2013-06-25 0:04 UTC (permalink / raw) To: Rick Jones; +Cc: Eric Dumazet, Ben Hutchings, netdev 2013/6/24 Rick Jones <rick.jones2@hp.com>: > On 06/24/2013 11:08 AM, Ricardo Landim wrote: >> >> Help in zero copy and improve in cost of syscalls. >> >> In my intel xeon(3.3ghz), read udp socket and write udp socket (proxy) >> spends ~40000 cycles (~12 us). > > > Are you quite certain your Xeon was actually running at 3.3GHz at the time? > I just did a quick netperf UDP_RR test between an old Centrino-based laptop > (HP 8510w) pegged at 1.6 GHz (cpufreq-set) and it was reporting a service > demand of 12.2 microseconds per transaction, which is, basically, a send and > recv pair plus stack: > > root@raj-8510w:~# netperf -t UDP_RR -c -i 30,3 -H tardy.usa.hp.com -- -r > 140,1MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 () port 0 AF_INET to > tardy.usa.hp.com () port 0 AF_INET : +/-2.500% @ 99% conf. : demo : first > burst 0 > !!! WARNING > !!! Desired confidence was not achieved within the specified iterations. > !!! This implies that there was variability in the test environment that > !!! must be investigated before going further. > !!! Confidence intervals: Throughput : 1.120% > !!! Local CPU util : 6.527% > !!! Remote CPU util : 0.000% > > Local /Remote > Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem > Send Recv Size Size Time Rate local remote local remote > bytes bytes bytes bytes secs. per sec % S % U us/Tr us/Tr > > 180224 180224 140 1 10.00 12985.58 7.93 -1.00 12.221 -1.000 > 212992 212992 > > (Don't fret too much about the confidence intervals bit, it almost made it.) > > Also, my 1400 byte test didn't have all that different a service demand: > > root@raj-8510w:~# netperf -t UDP_RR -c -i 30,3 -H tardy.usa.hp.com -- -r > 1400,1 > MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 () port 0 AF_INET to > tardy.usa.hp.com () port 0 AF_INET : +/-2.500% @ 99% conf. : demo : first > burst 0 > !!! WARNING > !!! Desired confidence was not achieved within the specified iterations. > !!! This implies that there was variability in the test environment that > !!! must be investigated before going further. > !!! Confidence intervals: Throughput : 1.123% > !!! Local CPU util : 6.991% > !!! Remote CPU util : 0.000% > > Local /Remote > Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem > Send Recv Size Size Time Rate local remote local remote > bytes bytes bytes bytes secs. per sec % S % U us/Tr us/Tr > > 180224 180224 1400 1 10.00 10055.33 6.27 -1.00 12.469 -1.000 > 212992 212992 > > Of course I didn't try very hard to force cache misses (eg using a big > send/recv ring) and there may have been other things happening on the system > causing a change between the two tests (separated by an hour or so). I > didn't make sure that interrupts stayed assigned to a specific CPU, nor that > netperf did. The kernel: > > root@raj-8510w:~# uname -a > Linux raj-8510w 3.8.0-25-generic #37-Ubuntu SMP Thu Jun 6 20:47:30 UTC 2013 > i686 i686 i686 GNU/Linux > > In general, I suppose if you want to quantify the overhead of copies, you > can try something like the two tests above, but for longer run times and > with more intermediate data points, as you walk the request or response size > up. Watch the change in service demand as you go. So long as you stay > below 1472 bytes (assuming IPv4 over a "standard" 1500 byte MTU Ethernet) > you won't generate fragments, and so will still have the same number of > packets per transaction. > > Or you could "perf" profile and look for copy routines. > > happy benchmarking, > > rick jones I make some tests with read/write operations.... fd: fd_in and fd_out are udp sockets events: epoll 1) read code: ... cycles = rdtsc(); r = recvfrom(fd_in, rtp_buf, 8192, 0, si, &soi); cycles = rdtsc() - cycles; ... result: Cycles best: 2715 Cycles worst: 59771 Cycles middle: 11587 2) write code: ... cycles = rdtsc(); w = sendto(fd_out, rtp_buf, r, 0, so, soo); cycles = rdtsc() - cycles; .... result: Cycles best: 6501 Cycles worst: 75455 Cycles middle: 25496 Kernel: # uname -a Linux host49-250 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux CPU: vendor_id : GenuineIntel cpu family : 6 model : 58 model name : Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz stepping : 9 microcode : 0x12 cpu MHz : 1600.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms bogomips : 6599.78 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: UDP splice 2013-06-24 17:09 ` Ricardo Landim 2013-06-24 17:53 ` Eric Dumazet @ 2013-06-24 18:38 ` David Miller 1 sibling, 0 replies; 13+ messages in thread From: David Miller @ 2013-06-24 18:38 UTC (permalink / raw) To: ricardolan; +Cc: bhutchings, netdev From: Ricardo Landim <ricardolan@gmail.com> Date: Mon, 24 Jun 2013 14:09:19 -0300 > Useful for UDP proxy, mainly for RTP payload including voice (voip) > and video live stream... Do not top post, thank you. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-06-25 0:04 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-06-21 10:04 UDP splice Ricardo Landim 2013-06-24 15:42 ` Ben Hutchings 2013-06-24 15:51 ` Hannes Frederic Sowa 2013-06-24 16:02 ` Ben Hutchings 2013-06-24 17:01 ` Hannes Frederic Sowa 2013-06-24 17:09 ` Ricardo Landim 2013-06-24 17:53 ` Eric Dumazet 2013-06-24 18:08 ` Ricardo Landim 2013-06-24 18:33 ` Eric Dumazet 2013-06-24 18:39 ` David Miller 2013-06-24 21:33 ` Rick Jones 2013-06-25 0:04 ` Ricardo Landim 2013-06-24 18:38 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).