From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: Network Stack discussion notes from 2015 DPDK Userspace Date: Mon, 12 Oct 2015 11:50:54 +0300 Message-ID: <561B746E.4070807@scylladb.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable To: "Wiles, Keith" , "dev@dpdk.org" Return-path: Received: from mail-lb0-f170.google.com (mail-lb0-f170.google.com [209.85.217.170]) by dpdk.org (Postfix) with ESMTP id CC3238E7D for ; Mon, 12 Oct 2015 10:50:56 +0200 (CEST) Received: by lbbk10 with SMTP id k10so23670775lbb.0 for ; Mon, 12 Oct 2015 01:50:56 -0700 (PDT) In-Reply-To: List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 10/10/2015 02:19 AM, Wiles, Keith wrote: > Here are some notes from the DPDK Network Stack discussion, I can remem= ber please help me fill in anything I missed. > > Items I remember we talked about: > > * The only reason for a DPDK TCP/IP stack is for performance and p= ossibly lower latency > * Meaning the developer is willing to re-write or write his app= lication to get the best performance. > * A TCP/IPv4/v6 stack is the minimum stack we need to support appl= ications linked with DPDK. > * SCTP is also another protocol that maybe required > * TCP is the primary protocol, usage model for most use cases > * Stack must be able to terminate TCP traffic to an application= linked to DPDK > * For DPDK the customer is looking for fast applications and is wi= lling to write the application just for DPDK network stack > * Converting an existing application could be done, but the de= sign is for performance and may require a lot of changes to an applicatio= n > * Using an application API that is not Socket is fine for high = performance and maybe the only way we get best performance. > * Need to supply a Socket layer interface as a option if custom= er is willing to take a performance hit instead of rewriting the applicat= ion > * Native application acceleration is desired, but not required whe= n using DPDK network stack > * We have two projects related to network stack in DPDK > * The first one is porting some TCP/IP stack to DPDK plus it ne= eds to give a reasonable performance increase over native Linux applicati= ons > * The stack code needs to be BSD/MIT like licensed (Open Sou= rced) > * The stack should be up to date with the latest RFCs or at = least close > * A stack could be written for DPDK (not using a existing co= de base) and its environment for best performance > * Need to be able to configure the DPDK stack(s) from the Li= nux command line tools if possible > * Need a DPDK specific application layer API for application= to interface with the network stack > * Could have a socket layer API on top of the specific API f= or applications needing to use sockets (not expected to be the best perfo= rmance) > * The second item is figuring out a new IPC for East/West traff= ic within the same system. > * The design needs to improve performance between applicatio= ns and be transparent to the application when the remote end is not on th= e same system. > * The new IPC path should be agnostic to local or remote end= points > * Needs to be very fast compared to current Linux IPC design= s. (Will OVS work here?) Basically, seastar [1] matches this exactly. Its TCP stack, unlike most=20 stacks, is sharded -- there is a separate stack running on each core=20 (but with a single IP address), no locking, zero-copy for both transmit=20 and receive. It has a fast IPC between cores (all data sharing in=20 seastar is via IPC queues; locks or atomic RMW operations are not=20 used). There is also an RPC subsystem that can be used for inter-node=20 communications. We've seen 7X performance improvements over the Linux=20 TCP stack when coding a simple HTTP server. Of course, it's not all roses. Seastar is written in C++, and the higher=20 layers are asynchronous, so there's a high barrier to entry for dpdk=20 developers. Maybe it can't be merged outright, but perhaps it can=20 provide some inspiration. (seastar supports subsets of TCP, UDP, ICMP, and DHCP over IPv4; no IPv6=20 support) [1] https://github.com/scylladb/seastar > Did I miss any details or comments, please reply and help me correct th= e comment or understanding. > > Thanks for everyone attending and packing into a small space. > > =97 > Regards, > ++Keith Wiles > Intel Corporation