From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bruce Richardson Subject: Re: tcpdump support in DPDK 2.3 Date: Wed, 16 Dec 2015 11:56:11 +0000 Message-ID: <20151216115611.GB10020@bricha3-MOBL3> References: <98CBD80474FA8B44BF855DF32C47DC358AF758@smartserver.smartshare.dk> <20151214182931.GA17279@mhcomputing.net> <20151214223613.GC21163@mhcomputing.net> <20151216104502.GA10020@bricha3-MOBL3> <98CBD80474FA8B44BF855DF32C47DC358AF76F@smartserver.smartshare.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: dev@dpdk.org To: Morten =?iso-8859-1?Q?Br=F8rup?= Return-path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id BD053A6A for ; Wed, 16 Dec 2015 12:56:15 +0100 (CET) Content-Disposition: inline In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC358AF76F@smartserver.smartshare.dk> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Dec 16, 2015 at 12:40:43PM +0100, Morten Br=F8rup wrote: > Bruce, >=20 > This doesn't really sound like tcpdump to me; it sounds like port mirro= ring. It's actually a bit of both, in my opinion, it's designed to allow basic = mirroring of traffic on a port to allow that traffic to be sent to a tcpdump destin= ation. By going with a more generic approach, we hope to enable more possible us= e cases than just focusing on TCP. >=20 > Your suggestion is limited to physical ports only, and cannot be attach= ed further inside the application, e.g. for mirroring packets related to = a specific VLAN. Yes, the lack of attachment inside the app is a limitation. There are two= types of scenarios that could be considered for packet capture: * ones where the application can be modified to do it's own filtering and capturing. * ones where you want a generic capture mechanism which can be used on an= y application without modification. We have chosen to focus more on the second one, as that is where a generi= c solution for DPDK is likely to lie. For the first case, the application w= riter himself knows the type of traffic and how best to capture and filter it, = so I don't think a generic one-size-fits-all solution is possible. [Though a c= ouple of helper libraries may be of use] As for physical ports, the scheme should work for any ethdev - why do you= see it only being limited to physical ports? What would you want to see monit= ored that we are missing. >=20 > Furthermore, it doesn't sound like the filtering part scales well. Cons= ider a fully loaded 40 Gbit/s port. You would need to copy all packets in= to a single rte_ring to the attached filtering process, which would then = require its own set of lcores to probably discard most of these packets w= hen filtering. I agree with Matthew that the filtering needs to happen as= close to the source as possible, and must be scalable to multiple lcores= . Without modifying the application itself to do it's own filtering I suspe= ct scalability is always going to be a problem. That being said, there is no particular reason why a single rte_ring needs to be used - we could allow= one ring per NIC queue for instance. The trouble with filtering at the source= itself is that you put extra load on the IO cores. By using a ring, we put the f= iltering load on extra cores in a secondary process which can be scaled by the use= r without touching the main app. >=20 > On the positive side, your idea has the advantage that the filter can b= e any application, and is not limited to BPF. However if the purpose is "= tcpdump", we should probably consider BPF, which is the type of filtering= offered by tcpdump. Having this work with any application is one of our primary targets here.= The app author should not have to worry too much about getting basic debug su= pport. Even if it doesn't work at 40G small packet rates, you can get a lot of b= enefit from a scheme that provides functional debugging for an app. Obviously, t= hough we aim to make this as scalable as possible, which is why we want to allo= w fitlering in userspace before sending packets externally to DPDK. >=20 > I would prefer having a BPF library available that the application can = use at any point, either at the lowest level (when receiving/transmitting= Ethernet packets) or at a higher level (e.g. when working with packets t= hat go into or come out of a tunnel). The BPF library should implement pa= cket length and relevant ancillary data, such as SKF_AD_VLAN_TAG etc. bas= ed on metadata in the mbuf. >=20 > Transferring a BPF filter from an outside application could be done by = using a simple text format, e.g. the output format of "tcpdump -ddd". Thi= s also opens an easy roadmap for Wireshark integration by simply extendin= g excap to include such a BPF filter format. >=20 >=20 > Lots of negativity above. I very much like the idea of attaching the se= condary process and going through an rte_ring. This allows the secondary = process to pass the filtered and captured packets on in any format it lik= es to any destination it likes. Good, so we're not completely off-base here. :-) /Bruce >=20 >=20 > Med venlig hilsen / kind regards > - Morten Br=F8rup >=20 > -----Original Message----- > From: Bruce Richardson [mailto:bruce.richardson@intel.com]=20 > Sent: 16. december 2015 11:45 >=20 > Hi, >=20 > we are currently doing some investigation and prototyping for this feat= ure. > Our current thinking is the following: > * to allow dynamic control of the filtering, we are thinking of making = use of > the multi-process infrastructure in DPDK. A secondary process can att= ach to a > primary at runtime and provide the packet filtering and dumping capab= ility. > * ideally we want to create a generic packet mirroring callback inside = the EAL, > that can be set up to mirror packets going through Rx/Tx on an ethdev= . > * using this, packets being received on the port to be monitored are se= nt via > an rte_ring (ring ethdev) to the secondary process which takes those = packets > and does any filtering on them. [This would be where BPF could fit in= to > things, but it's not something we have looked at yet.] > * initially we plan to have the secondary process then write packets to= a pcap > file using a pcap PMD, but down the road if we get other PMDs, like a= KNI PMD > or a TAP device PMD, those could be used as targets instead. >=20 > This implementation we hope should provide enough hooks to enable the s= tandard tools to be used for monitoring and capturing packets. We will se= nd out draft implementation code for various parts of this as soon as we = have it. >=20 > Additional feedback welcome, as always. :-) >=20 > Regards, > /Bruce >=20 >=20