From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bruce Richardson Subject: Re: tcpdump support in DPDK 2.3 Date: Wed, 16 Dec 2015 13:12:49 +0000 Message-ID: <20151216131249.GC10020@bricha3-MOBL3> References: <98CBD80474FA8B44BF855DF32C47DC358AF758@smartserver.smartshare.dk> <20151214182931.GA17279@mhcomputing.net> <20151214223613.GC21163@mhcomputing.net> <20151216104502.GA10020@bricha3-MOBL3> <98CBD80474FA8B44BF855DF32C47DC358AF76F@smartserver.smartshare.dk> <20151216115611.GB10020@bricha3-MOBL3> <98CBD80474FA8B44BF855DF32C47DC358AF771@smartserver.smartshare.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: dev@dpdk.org To: Morten =?iso-8859-1?Q?Br=F8rup?= Return-path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id B198D902 for ; Wed, 16 Dec 2015 14:12:53 +0100 (CET) Content-Disposition: inline In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC358AF771@smartserver.smartshare.dk> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Dec 16, 2015 at 01:26:11PM +0100, Morten Br=F8rup wrote: > Bruce, >=20 > Please note that tcpdump is a stupid name for a packet capture applicat= ion that supports much more than just TCP. >=20 > I had missed the point about ethdev supporting virtual interfaces, so t= hank you for pointing that out. That covers my concerns about capturing p= ackets inside tunnels. >=20 > I will gladly admit that you Intel guys are probably much more competen= t in the field of DPDK performance and scalability than I am. So Matthew = and I have been asking you to kindly ensure that your solution scales wel= l at very high packet rates too, and pointing out that filtering before c= opying is probably cheaper than copying before filtering. You mention tha= t it leads to an important choice about which lcores get to do the work o= f filtering the packets, so that might be worth some discussion. >=20 > :-) >=20 > Med venlig hilsen / kind regards > - Morten Br=F8rup >=20 Thanks for your support. We may look at having a certain amount of flexibility in the configuratio= n of the setup, so as to avoid limiting the use of the functionality. For scalability at very high packet rates, it's something we'll need you = guys to give us pointers on too - what's acceptable or not inside an app, and wha= t level of scalabilty is needed. I'd admit that most of our initial thinkin= g in this area was for debugging apps at less than line rate i.e. for functional te= sting. For full line rate introspection, we'll have to see when we get some work= ing code. /Bruce >=20 > -----Original Message----- > From: Bruce Richardson [mailto:bruce.richardson@intel.com]=20 > Sent: 16. december 2015 12:56 > To: Morten Br=F8rup > Cc: Matthew Hall; Kyle Larose; dev@dpdk.org > Subject: Re: [dpdk-dev] tcpdump support in DPDK 2.3 >=20 > On Wed, Dec 16, 2015 at 12:40:43PM +0100, Morten Br=F8rup wrote: > > Bruce, > >=20 > > This doesn't really sound like tcpdump to me; it sounds like port mir= roring. >=20 > It's actually a bit of both, in my opinion, it's designed to allow basi= c mirroring of traffic on a port to allow that traffic to be sent to a tc= pdump destination. > By going with a more generic approach, we hope to enable more possible = use cases than just focusing on TCP. >=20 >=20 > >=20 > > Your suggestion is limited to physical ports only, and cannot be atta= ched further inside the application, e.g. for mirroring packets related t= o a specific VLAN. >=20 > Yes, the lack of attachment inside the app is a limitation. There are t= wo types of scenarios that could be considered for packet capture: > * ones where the application can be modified to do it's own filtering a= nd capturing. > * ones where you want a generic capture mechanism which can be used on = any application without modification. > We have chosen to focus more on the second one, as that is where a gene= ric solution for DPDK is likely to lie. For the first case, the applicati= on writer himself knows the type of traffic and how best to capture and f= ilter it, so I don't think a generic one-size-fits-all solution is possib= le. [Though a couple of helper libraries may be of use] >=20 > As for physical ports, the scheme should work for any ethdev - why do y= ou see it only being limited to physical ports? What would you want to se= e monitored that we are missing. >=20 > >=20 > > Furthermore, it doesn't sound like the filtering part scales well. Co= nsider a fully loaded 40 Gbit/s port. You would need to copy all packets = into a single rte_ring to the attached filtering process, which would the= n require its own set of lcores to probably discard most of these packets= when filtering. I agree with Matthew that the filtering needs to happen = as close to the source as possible, and must be scalable to multiple lcor= es. >=20 > Without modifying the application itself to do it's own filtering I sus= pect scalability is always going to be a problem. That being said, there = is no particular reason why a single rte_ring needs to be used - we could= allow one ring per NIC queue for instance. The trouble with filtering at= the source itself is that you put extra load on the IO cores. By using a= ring, we put the filtering load on extra cores in a secondary process wh= ich can be scaled by the user without touching the main app. >=20 > >=20 > > On the positive side, your idea has the advantage that the filter can= be any application, and is not limited to BPF. However if the purpose is= "tcpdump", we should probably consider BPF, which is the type of filteri= ng offered by tcpdump. >=20 > Having this work with any application is one of our primary targets her= e. The app author should not have to worry too much about getting basic d= ebug support. > Even if it doesn't work at 40G small packet rates, you can get a lot of= benefit from a scheme that provides functional debugging for an app. Obv= iously, though we aim to make this as scalable as possible, which is why = we want to allow fitlering in userspace before sending packets externally= to DPDK. >=20 > >=20 > > I would prefer having a BPF library available that the application ca= n use at any point, either at the lowest level (when receiving/transmitti= ng Ethernet packets) or at a higher level (e.g. when working with packets= that go into or come out of a tunnel). The BPF library should implement = packet length and relevant ancillary data, such as SKF_AD_VLAN_TAG etc. b= ased on metadata in the mbuf. > >=20 > > Transferring a BPF filter from an outside application could be done b= y using a simple text format, e.g. the output format of "tcpdump -ddd". T= his also opens an easy roadmap for Wireshark integration by simply extend= ing excap to include such a BPF filter format. > >=20 > >=20 > > Lots of negativity above. I very much like the idea of attaching the = secondary process and going through an rte_ring. This allows the secondar= y process to pass the filtered and captured packets on in any format it l= ikes to any destination it likes. >=20 > Good, so we're not completely off-base here. :-) >=20 > /Bruce >=20 > >=20 > >=20 > > Med venlig hilsen / kind regards > > - Morten Br=F8rup > >=20 > > -----Original Message----- > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > Sent: 16. december 2015 11:45 > >=20 > > Hi, > >=20 > > we are currently doing some investigation and prototyping for this fe= ature. > > Our current thinking is the following: > > * to allow dynamic control of the filtering, we are thinking of makin= g use of > > the multi-process infrastructure in DPDK. A secondary process can a= ttach to a > > primary at runtime and provide the packet filtering and dumping cap= ability. > > * ideally we want to create a generic packet mirroring callback insid= e the EAL, > > that can be set up to mirror packets going through Rx/Tx on an ethd= ev. > > * using this, packets being received on the port to be monitored are = sent via > > an rte_ring (ring ethdev) to the secondary process which takes thos= e packets > > and does any filtering on them. [This would be where BPF could fit = into > > things, but it's not something we have looked at yet.] > > * initially we plan to have the secondary process then write packets = to a pcap > > file using a pcap PMD, but down the road if we get other PMDs, like= a KNI PMD > > or a TAP device PMD, those could be used as targets instead. > >=20 > > This implementation we hope should provide enough hooks to enable the= standard tools to be used for monitoring and capturing packets. We will = send out draft implementation code for various parts of this as soon as w= e have it. > >=20 > > Additional feedback welcome, as always. :-) > >=20 > > Regards, > > /Bruce > >=20 > >=20 >=20