netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Thomas Graf <tgraf@suug.ch>
Cc: Jiri Pirko <jiri@resnulli.us>,
	netdev@vger.kernel.org, davem@davemloft.net,
	nhorman@tuxdriver.com, andy@greyhouse.net, dborkman@redhat.com,
	ogerlitz@mellanox.com, jesse@nicira.com, pshelar@nicira.com,
	azhou@nicira.com, ben@decadent.org.uk,
	stephen@networkplumber.org, jeffrey.t.kirsher@intel.com,
	vyasevic@redhat.com, xiyou.wangcong@gmail.com,
	john.r.fastabend@intel.com, edumazet@google.com,
	jhs@mojatatu.com, sfeldma@cumulusnetworks.com,
	f.fainelli@gmail.com, roopa@cumulusnetworks.com,
	linville@tuxdriver.com, dev@openvswitch.org, jasowang@redhat.com,
	ebiederm@xmission.com, nicolas.dichtel@6wind.com,
	ryazanov.s.a@gmail.com, buytenh@wantstofly.org,
	aviadr@mellanox.com, nbd@openwrt.org, Neil.Jerram@metaswitch.com,
	ronye@mellanox.com
Subject: Re: [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath
Date: Tue, 9 Sep 2014 14:09:12 -0700	[thread overview]
Message-ID: <20140909210910.GA25899@ITs-MacBook-Pro.local> (raw)
In-Reply-To: <20140908135413.GA5995@casper.infradead.org>

On Mon, Sep 08, 2014 at 02:54:13PM +0100, Thomas Graf wrote:
> On 09/03/14 at 11:24am, Jiri Pirko wrote:
> > This patchset can be divided into 3 main sections:
> > - introduce switchdev api for implementing switch drivers
> > - add hardware acceleration bits into openvswitch datapath, This uses
> >   previously mentioned switchdev api
> > - introduce rocker switch driver which implements switchdev api
> 
> Jiri, Scott,
> 
> Enclosed is the GOOG doc which outlines some details on my particular
> interests [0]. It includes several diagrams which might help to
> understand the overall arch. It is highly related to John's work as
> well. Please let me know if something does not align with the model
> you have in mind.
> 
> Summary:
> The full virtual tunnel endpoint flow offload attempts to offload full
> flows to the hardware and utilize the embedded switch on the host NIC
> to empower the eSwitch with the required flexibility of the software
> driven network. In this model, the guest (VM or LXC) attaches through a
> SR-IOV VF which serves as the primary path. A slow path / software path
> is provided via the CPU which can route packets back into the VF by
> tagging packets with forwarding metadata and sending the frame back to
> the NIC.
> 
> [0] https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit?usp=sharing
> (Publicly accessible and open for comments)

Great doc. Very clear. I wish I could write docs like this :)

Few questions:
- on the 1st slide dpdk is used accept vm and lxc packet. How is that working?
  I know of 3 dpdk mechanisms to receive vm traffic, but all of them are kinda
  deficient, since offloads need to be disabled inside VM, so VM to VM
  performance over dpdk is not impressive. What is there for lxc?
  Is there a special pmd that can take packets from veth?

- full offload vs partial.
  The doc doesn't say, but I suspect we want transition from full to partial
  to be transparent? Especially for lxc. criu should be able to snapshot
  container on one box with full offload and restore it seamlessly on the
  other machine with partial offload, right?

- full offload with two nics.
  how bonding and redundancy suppose to work in such case?
  If wire attached to eth0 no longer passing packet, how traffic from VM1
  will reach eth1 on a different nic? Via sw datapath (flow table) ?
  I suspect we want to reuse current bonding/team abstraction here.
  I'm not quite getting the whole point of two separate physical nics.
  Is it for completeness and generality of the picture ?
  I think typical hypervisor will likely have only one multi-port nic, then
  bonding can be off-loaded within single nic via bonding driver.
  Partial offload scenario doesn't have this issue, since 'flow table'
  is fed by standard netdev which can be bond-dev and everything else, right?

- number of VFs
  I believe it's still very limited even in the newest nics, but
  number of containers will be large.
  So some lxcs will be using VFs and some will use standard veth?
  We cannot swap them dynamically based on load, so I'm not sure
  how VF approach is generically applicable here. For some use cases
  with demanding lxcs, it probably helps, but is it worth the gains?

Thanks!

  reply	other threads:[~2014-09-09 21:09 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-03  9:24 [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Jiri Pirko
2014-09-03  9:24 ` [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones Jiri Pirko
     [not found]   ` <1409736300-12303-2-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 15:20     ` John Fastabend
2014-09-03 21:11       ` Jamal Hadi Salim
     [not found]       ` <540731B9.4010603-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-03 18:42         ` Pravin Shelar
     [not found]           ` <CALnjE+rk26Om1O5_Q=8tn7eAyh4Ywen-1+UD_nCVj_geZY1HuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-04 12:25             ` Jiri Pirko
2014-09-04 12:09         ` Jiri Pirko
2014-09-03 18:41   ` Pravin Shelar
2014-09-03 21:22     ` Jamal Hadi Salim
     [not found]       ` <54078694.5040104-jkUAjuhPggJWk0Htik3J/w@public.gmane.org>
2014-09-03 21:59         ` Pravin Shelar
     [not found]           ` <CALnjE+qUqSK7kHSi5BZuA0hzFjMcZ8TCTd9JRG1PPmMfDmAQOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-04  1:54             ` Jamal Hadi Salim
     [not found]     ` <CALnjE+pscRmfhaWgkWCunJfjvG04RiNUAj6nefSFHrknQTC+xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-04 12:33       ` Jiri Pirko
     [not found]         ` <20140904123323.GF1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
2014-09-04 20:46           ` Pravin Shelar
2014-09-17  8:34             ` Jiri Pirko
2014-09-17 22:07               ` Jesse Gross
2014-09-03  9:24 ` [patch net-next 02/13] net: rename netdev_phys_port_id to more generic name Jiri Pirko
     [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03  9:24   ` [patch net-next 03/13] net: introduce generic switch devices support Jiri Pirko
     [not found]     ` <1409736300-12303-4-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 15:46       ` John Fastabend
     [not found]         ` <540737CF.4000402-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-04 12:46           ` Jiri Pirko
2014-09-03  9:24   ` [patch net-next 04/13] rtnl: expose physical switch id for particular device Jiri Pirko
2014-09-03  9:24   ` [patch net-next 05/13] net-sysfs: " Jiri Pirko
2014-09-03  9:24   ` [patch net-next 06/13] net: introduce dummy switch Jiri Pirko
2014-09-03  9:24   ` [patch net-next 07/13] dsa: implement ndo_swdev_get_id Jiri Pirko
     [not found]     ` <1409736300-12303-8-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 23:20       ` Florian Fainelli
     [not found]         ` <5407A25A.8050401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-04 12:47           ` Jiri Pirko
     [not found]             ` <20140904124701.GH1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
2014-09-05  4:43               ` Felix Fietkau
2014-09-05  5:52                 ` Jiri Pirko
2014-09-03  9:24   ` [patch net-next 10/13] openvswitch: add support for datapath hardware offload Jiri Pirko
     [not found]     ` <1409736300-12303-11-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 16:37       ` John Fastabend
     [not found]         ` <540743B4.9080500-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-04 12:48           ` Jiri Pirko
     [not found]             ` <20140904124837.GI1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
2014-09-05  3:59               ` Simon Horman
2014-09-03  9:24   ` [patch net-next 11/13] sw_flow: add misc section to key with in_port_ifindex field Jiri Pirko
2014-09-03  9:24   ` [patch net-next 12/13] rocker: introduce rocker switch driver Jiri Pirko
2014-09-03  9:24 ` [patch net-next 08/13] net: introduce netdev_phys_item_ids_match helper Jiri Pirko
2014-09-03  9:24 ` [patch net-next 09/13] openvswitch: introduce vport_op get_netdev Jiri Pirko
2014-09-03  9:25 ` [patch net-next 13/13] switchdev: introduce Netlink API Jiri Pirko
2014-09-08 13:54 ` [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Thomas Graf
2014-09-09 21:09   ` Alexei Starovoitov [this message]
2014-09-15 12:43     ` Thomas Graf
2014-09-16 15:58   ` Jiri Pirko
     [not found]     ` <20140916155832.GA1869-6KJVSR23iU488b5SBfVpbw@public.gmane.org>
2015-06-29  5:44       ` Neelakantam Gaddam
     [not found]         ` <CAOv37=BNU1-+kgTR6RUqxw7snJL6=5g-rLYhuPc1F-V0B1k7tA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-29  5:46           ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140909210910.GA25899@ITs-MacBook-Pro.local \
    --to=alexei.starovoitov@gmail.com \
    --cc=Neil.Jerram@metaswitch.com \
    --cc=andy@greyhouse.net \
    --cc=aviadr@mellanox.com \
    --cc=azhou@nicira.com \
    --cc=ben@decadent.org.uk \
    --cc=buytenh@wantstofly.org \
    --cc=davem@davemloft.net \
    --cc=dborkman@redhat.com \
    --cc=dev@openvswitch.org \
    --cc=ebiederm@xmission.com \
    --cc=edumazet@google.com \
    --cc=f.fainelli@gmail.com \
    --cc=jasowang@redhat.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=jesse@nicira.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.r.fastabend@intel.com \
    --cc=linville@tuxdriver.com \
    --cc=nbd@openwrt.org \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=nicolas.dichtel@6wind.com \
    --cc=ogerlitz@mellanox.com \
    --cc=pshelar@nicira.com \
    --cc=ronye@mellanox.com \
    --cc=roopa@cumulusnetworks.com \
    --cc=ryazanov.s.a@gmail.com \
    --cc=sfeldma@cumulusnetworks.com \
    --cc=stephen@networkplumber.org \
    --cc=tgraf@suug.ch \
    --cc=vyasevic@redhat.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).