netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Best way to reduce system call overhead for tun device I/O?
@ 2016-03-29 22:40 Guus Sliepen
  2016-03-31 21:18 ` Tom Herbert
  0 siblings, 1 reply; 10+ messages in thread
From: Guus Sliepen @ 2016-03-29 22:40 UTC (permalink / raw)
  To: netdev

I'm trying to reduce system call overhead when reading/writing to/from a
tun device in userspace. For sockets, one can use sendmmsg()/recvmmsg(),
but a tun fd is not a socket fd, so this doesn't work. I'm see several
options to allow userspace to read/write multiple packets with one
syscall:

- Implement a TX/RX ring buffer that is mmap()ed, like with AF_PACKET
  sockets.

- Implement a ioctl() to emulate sendmmsg()/recvmmsg().

- Add a flag that can be set using TUNSETIFF that makes regular
  read()/write() calls handle multiple packets in one go.

- Expose a socket fd to userspace, so regular sendmmsg()/recvmmsg() can
  be used. There is tun_get_socket() which is used internally in the
  kernel, but this is not exposed to userspace, and doesn't look trivial
  to do either.

What would be the right way to do this?

-- 
Met vriendelijke groet / with kind regards,
     Guus Sliepen <guus@tinc-vpn.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: Best way to reduce system call overhead for tun device I/O?
@ 2016-04-04 13:35 ValdikSS
  2016-04-04 17:28 ` Stephen Hemminger
  0 siblings, 1 reply; 10+ messages in thread
From: ValdikSS @ 2016-04-04 13:35 UTC (permalink / raw)
  To: Guus Sliepen
  Cc: Stephen Hemminger, Willem de Bruijn, David Miller, Tom Herbert,
	netdev

I'm trying to increase OpenVPN throughput by optimizing tun manipulations, too.
Right now I have more questions than answers.

I get about 800 Mbit/s speeds via OpenVPN with authentication and encryption disabled on a local machine with OpenVPN server and client running in a different
network namespaces, which use veth for networking, with 1500 MTU on a TUN interface. This is rather limiting. Low-end devices like SOHO routers could only
achieve 15-20 Mbit/s via OpenVPN with encryption with a 560 MHz CPU.
Increasing MTU reduces overhead. You can get > 5GBit/s if you set 16000 MTU on a TUN interface.
That's not only OpenVPN related. All the tunneling software I tried can't achieve gigabit speeds without encryption on my machine with MTU 1500. Didn't test
tinc though.

TUN supports various offloading techniques: GSO, TSO, UFO, just as hardware NICs. From what I understand, if we use GSO/GRO for TUN, we would be able to receive
send small packets combined in a huge one with one send/recv call with MTU 1500 on a TUN interface, and the performance should increase and be just as it now
with increased MTU. But there is a very little information of how to use offloading with TUN.
I've found an old example code which creates TUN interface with GSO support (TUN_VNET_HDR), does NAT and echoes TUN data to stdout, and a script to run two
instances of this software connected with a pipe. But it doesn't work for me, I never see any combined frames (gso_type is always 0 in a virtio_net_hdr header).
Probably I did something wrong, but I'm not sure what exactly is wrong.

Here's said application: http://ovrload.ru/f/68996_tun.tar.gz

The questions are as follows:

 1. Do I understand correctly that GSO/GRO would have the same effect as increasing MTU on TUN interface?
 2. How GRO/GSO is different from TSO, UFO?
 3. Can we get and send combined frames directly from/to NIC with offloading support?
 4. How to implement GRO/GSO, TSO, UFO? What should be the logic behind it?


Any reply is greatly appreciated.

P.S. this could be helpful: https://ldpreload.com/p/tuntap-notes.txt

> I'm trying to reduce system call overhead when reading/writing to/from a
> tun device in userspace. For sockets, one can use sendmmsg()/recvmmsg(),
> but a tun fd is not a socket fd, so this doesn't work. I'm see several
> options to allow userspace to read/write multiple packets with one
> syscall:
>
> - Implement a TX/RX ring buffer that is mmap()ed, like with AF_PACKET
>   sockets.
>
> - Implement a ioctl() to emulate sendmmsg()/recvmmsg().
>
> - Add a flag that can be set using TUNSETIFF that makes regular
>   read()/write() calls handle multiple packets in one go.
>
> - Expose a socket fd to userspace, so regular sendmmsg()/recvmmsg() can
>   be used. There is tun_get_socket() which is used internally in the
>   kernel, but this is not exposed to userspace, and doesn't look trivial
>   to do either.
>
> What would be the right way to do this?
>
> -- 
> Met vriendelijke groet / with kind regards,
>      Guus Sliepen <guus@tinc-vpn.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread
[parent not found: <57026C8F.8050406@valdikss.org.ru>]

end of thread, other threads:[~2016-04-04 17:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-29 22:40 Best way to reduce system call overhead for tun device I/O? Guus Sliepen
2016-03-31 21:18 ` Tom Herbert
2016-03-31 21:20   ` David Miller
2016-03-31 22:28     ` Guus Sliepen
2016-03-31 23:39       ` Stephen Hemminger
2016-04-03 23:03         ` Willem de Bruijn
2016-04-04 14:40           ` Guus Sliepen
  -- strict thread matches above, loose matches on Subject: below --
2016-04-04 13:35 ValdikSS
2016-04-04 17:28 ` Stephen Hemminger
     [not found] <57026C8F.8050406@valdikss.org.ru>
2016-04-04 14:31 ` Guus Sliepen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).