From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Mark D. Gray" Subject: Re: [ovs-dev] Status of Open vSwitch with DPDK Date: Mon, 17 Aug 2015 15:53:01 +0100 Message-ID: <55D1F54D.9070205@intel.com> References: <738D45BC1F695740A983F43CFE1B7EA9437C8255@IRSMSX108.ger.corp.intel.com> <20150815071630.GB2600@x240.home> Reply-To: mark.d.gray@intel.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable To: Daniele Di Proietto , "dev@openvswitch.org" , dev Return-path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 86C8C8E5B for ; Mon, 17 Aug 2015 16:53:04 +0200 (CEST) In-Reply-To: <20150815071630.GB2600@x240.home> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 08/15/15 08:16, Flavio Leitner wrote: > On Fri, Aug 14, 2015 at 04:04:40PM +0000, Gray, Mark D wrote: >> Hi Daniele, >> >> Thanks for starting this conversation. It is a good list :) I have cro= ssed-posted this >> to dpdk.org as I feel that some of the points could be interesting to = that community >> as they are related to how DPDK is used. >> >> How do "users" of OVS with DPDK feel about this list? Does anyone disa= gree or >> does anyone have any additions? What are your experiences? >> >>> >>> There has been some discussion lately about the status of the Open vS= witch >>> port to DPDK. While part of the code has been tested for quite some = time, >>> I think we can agree that there are a few rough spots that prevent it= from >>> being easily deployed and used. >>> >>> I was hoping to get some feedback from the community about those roug= h >>> spots, >>> i.e. areas where OVS+DPDK can/needs to improve to become more >>> "production >>> ready" and user-friendly. >>> >>> - PMD threads and queues management: the code has shown several bugs >>> and >>> the >>> netdev interfaces don't seem up to the job anymore. >> >> You had a few ideas about how to refactor this before but I was concer= ned >> about the effect it would have on throughput. I can't find the thread. >> >> Do you have some further ideas about how to achieve this? > > I miss the fact that we can't tell which queue can go to each PMD and > also that all devices must have the same number of rx queues. I agree > that there are other issues, but it seems the kind of configuration > knobs I am looking for might not be the end goal since what has been > said is to look for a more automated way. Having said so, I also > would like to hear if you have further ideas about how to archive that. > > >>> There's a lot of margin of improvement: we could factor out the co= de from >>> dpif-netdev, add configuration parameters for advanced users, and = figure >>> out >>> a way to add unit tests. >>> >> >> I think this is a general issue with both the kernel datapath (and net= devs) >> and the userspace datapath. There isn't much unit testing (or testing)= outside >> of the slow path. > > Maybe we could exercise the interfaces using pcap pmd. > > We had a similar idea. Using this, it would be possible to test the=20 entire datapath or netdev for functionality! I don=E2=80=99t think there = is an=20 equivalent for the kernel datapath? >>> Related to this, the system should be as fast as possible out-of-t= he-box, >>> without requiring too much tuning. >> >> This is a good point. I think the kernel datapath has a similar issue.= You can >> get a certain level of performance without compiling with -Ofast or >> pinning threads but you will (even with the kernel datapath) get bette= r >> performance if you pin threads (and possibly compile differently). I g= uess >> it is more visible with the dpdk datapath as performance is one of the= key >> values. It is also more detrimental to the performance if you don't se= t it >> up correctly. > > Not only that, you need to consider how the resources will be > distributed upfront so that you don't run out of hugepages, perhaps > isolate PMD CPUs from the Linux scheduler, etc. So, I think a more > realistic goal would be: the system should require minimal/none tuning > to run with acceptable performance. > How do you define "acceptable" performance :)? > >> Perhaps we could provide scripts to help do this? > > Or profiles (if that isn't included in your scripts definition) > Maybe we should define profiles like "performance", "minimum cores", etc > >> I think this is also interesting to the DPDK community. There is >> knowledge required when running DPDK enabled apps to >> get good performance: core pinning is one thing that comes to mind. >> >>> >>> - Userspace tunneling: while the code has been there for quite some t= ime it >>> hasn't received the level of testing that the Linux kernel datapat= h >>> tunneling >>> has. >>> >> >> Again, there is a lack of test infrastructure in general for OVS. vspe= rf is a good >> start, and it would be great to see more people use and contribute to = it! > > Yes. > > >>> - Documentation: other than a step by step tutorial, it cannot be sa= id >>> that >>> DPDK is a first class citizen in the OVS documentation. Manpages = could >>> be >>> improved. >> >> Easily done. The INSTALL guide is pretty good but the structure could = be better. >> There is also a lack of manpages. Good point. > > Yup. > > >>> - Vhost: the code has not received the level of testing of the kernel >>> vhost. >>> Another doubt shared by some developers is whether we should keep >>> vhost-cuse, given its relatively low ease of use and the overlappi= ng with >>> the far more standard vhost-user. >> >> vhost-cuse is required for older versions of qemu. I'm aware of some c= ompanies >> using it as they are restricted to an older version of qemu. I think i= t is deprecated >> at the moment? Is there a notice to that effect? We just need a plan f= or when to >> remove it and make sure that plan is clear? > > Apparently having two solutions to address the same issue causes more > harm than good, so removing vhost-cuse would be helpful. I agree that > we need a clear plan with a soak time so users can either upgrade to > vhost-user or tell why they can't. > > >>> - Interface management and naming: interfaces must be manually remove= d >>> from >>> the kernel drivers. >>> >>> We still don't have an easy way to identify them. Ideas are welcom= e: how >>> can >>> we make this user friendly? Is there a better solution on the DPD= K side? >> >> This is a tough one and is interesting to the DPDK community. The bas= ic issue >> here is that users are more familiar with linux interfaces and linux n= aming >> conventions. >> >> "ovs-vsctl add-port bro eth0" makes a lot more sense than >> >> "dpdk_nic_bind -b igb_uio", then check the order that the port= s >> are enumerated and then run "ovs-vsctl add-port br0 dpdkN". >> >> I can think of ways to do this with physical NICs. For example, >> you could reference the port by the linux name and when you try to add= it, OVS >> could unbind from the kernel module and bind it to igb_uio? >> >> However, I am not sure how you would do it with virtual nics as there = is not >> even a real device. >> >> I think a general solution from the dpdk community would be really hel= pful here. > > > It doesn't look like openvswitch is the right place to fix this. The > openvswitch should deal with the port and the system should provide > the port somehow. That's what happens with the kernel datapath, for > instance, openvswitch doesn't load any NIC driver. > > So, it seems to be more related to udev/systemd configuration in which > the sys admin would tell the interfaces and the appropriate driver > (UIO/VFIO/Bifurcated...). > > Even if the system delivers the DPDK port ready, it would be great to > have some friendly mapping so that users can refer to ports with known > names. > Agreed > >>> How are DPDK interfaces handled by linux distributions? I've heard= about >>> ongoing work for RHEL and Ubuntu, it would be interesting to coord= inate. > > We have implemented dpdk/vhost support in initscripts so you could > configure the ports in the same way as for the kernel devices, but > how to properly bind to the driver is unclear yet. > > >>> - Insight into the system and debuggability: nothing beats tcpdump fo= r the >>> kernel datapath. Can something similar be done for the userspace >>> datapath? >> >> Yeah, this would be useful. I have my own way of dealing with this. Fo= r example, >> you could dump from the LOCAL port on a NORMAL bridge or add a rule to >> mirror a flow to another port but I feel there could be a better way t= o do this in >> DPDK. I have recently heard that the DPDK team do something with a pca= p pmd >> to help with debugging. A more general approach from dpdk would help a= lot. > > One idea maybe is that openvswitch could provide a mode to clone TX/RX > packets to a pcap pmd. Or write the packets using pcap format directly > to a file (avoid another pmd which might not be available). Or even > push them using a tap device. Either way tcpdump or wireshark would wor= k. > > >>> - Consistency of the tools: some commands are slightly different for = the >>> userspace/kernel datapath. Ideally there shouldn't be any differe= nce. > > Could you give some examples? > > >> Yeah, there are some things that could be changed. DPDK just works dif= ferently but >> the benefits are significant :) >> >> We need to mount hugepages, bind nics to igb_uio, etc >> >> With a lot of this stuff, maybe the DPDK community's tools don't need = to emulate >> the linux networking tools exactly. Maybe over time as the DPDK commun= ity >> and user-base expands, people will become more familiar with the tools= , processes, etc >> and this will be less of an issue? >> >> >>> >>> - Packaging: how should the distributions package DPDK and OVS? Shoul= d >>> there >>> only be a single build to handle both the kernel and the userspace >>> datapath, >>> eventually dynamically linked to DPDK? >> >> Yeah. Do we need to start with dpdk if we have compiled with DPDK supp= ort??? > > Well, certainly not everybody wants to have DPDK dependencies neither > shared nor statically. Maybe the path is a plug-in architecture? > > >>> - Benchmarks: we often rely on extremely simple flow tables with sing= le >>> flow >>> traffic to evaluate the effect of a change. That may be ok during >>> development, but OVS with the kernel datapath has been tested in >>> different >>> scenarios with more complicated flow tables and even with hostile = traffic >>> patterns. >>> >>> Efforts in this sense are being made, like the vsperf project, or = even >>> the >>> simple ovs-pipeline.py >> >> vsperf will really help this. > > Indeed, but how is OVS kernel datapath being tested? Is there a > script? Maybe we can use the same tests for DPDK. > > >>> I would appreciate feedback on the above points, not (only) in terms = of >>> solutions, but in terms of requirements that you feel are important f= or our >>> system to be considered ready. > > The list covers technical issues, documentation issues and usability > issues which are great, thanks for doing it. However, as said one > important use-case is extreme performance and that requires configurati= on > or tuning flexibility which adds usability/supportability issues. Will > those knobs be a valid option provided that the defaults works well eno= ugh? > I feel that we need to expose knobs up through Open vSwitch in order to=20 tune for extreme performance otherwise how do we highlight the value in=20 what we are doing? I think we need some way to allow a user to do this=20 type of configuration when they know what they are doing (without having=20 to recompile the code). > Thanks, > fbl >