netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Graf <tgraf@suug.ch>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>,
	netdev@vger.kernel.org, davem@davemloft.net,
	nhorman@tuxdriver.com, andy@greyhouse.net, dborkman@redhat.com,
	ogerlitz@mellanox.com, jesse@nicira.com, pshelar@nicira.com,
	azhou@nicira.com, ben@decadent.org.uk,
	stephen@networkplumber.org, jeffrey.t.kirsher@intel.com,
	vyasevic@redhat.com, xiyou.wangcong@gmail.com,
	john.r.fastabend@intel.com, edumazet@google.com,
	jhs@mojatatu.com, sfeldma@cumulusnetworks.com,
	f.fainelli@gmail.com, roopa@cumulusnetworks.com,
	linville@tuxdriver.com, dev@openvswitch.org, jasowang@redhat.com,
	ebiederm@xmission.com, nicolas.dichtel@6wind.com,
	ryazanov.s.a@gmail.com, buytenh@wantstofly.org,
	aviadr@mellanox.com, nbd@openwrt.org, Neil.Jerram@metaswitch.com,
	ronye@mellanox.com
Subject: Re: [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath
Date: Mon, 15 Sep 2014 13:43:58 +0100	[thread overview]
Message-ID: <20140915124358.GA21541@casper.infradead.org> (raw)
In-Reply-To: <20140909210910.GA25899@ITs-MacBook-Pro.local>

On 09/09/14 at 02:09pm, Alexei Starovoitov wrote:
> On Mon, Sep 08, 2014 at 02:54:13PM +0100, Thomas Graf wrote:
> > [0] https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit?usp=sharing
> > (Publicly accessible and open for comments)
> 
> Great doc. Very clear. I wish I could write docs like this :)
> 
> Few questions:
> - on the 1st slide dpdk is used accept vm and lxc packet. How is that working?
>   I know of 3 dpdk mechanisms to receive vm traffic, but all of them are kinda
>   deficient, since offloads need to be disabled inside VM, so VM to VM
>   performance over dpdk is not impressive. What is there for lxc?
>   Is there a special pmd that can take packets from veth?

Glad to see you are paying attention ;-) I'm assuming somebody to
write a veth PMD at some point. It does not exist yet.

> - full offload vs partial.
>   The doc doesn't say, but I suspect we want transition from full to partial
>   to be transparent? Especially for lxc. criu should be able to snapshot
>   container on one box with full offload and restore it seamlessly on the
>   other machine with partial offload, right?

Correct. In a full offload environment, the CPU path could still use
partial offload functionality. I'll update the doc.

> - full offload with two nics.
>   how bonding and redundancy suppose to work in such case?
>   If wire attached to eth0 no longer passing packet, how traffic from VM1
>   will reach eth1 on a different nic? Via sw datapath (flow table) ?

Yes.

>   I suspect we want to reuse current bonding/team abstraction here.

Yes, both kernel bond/team and OVS group table based abstraction (see
Simon's recent effrots).

>   I'm not quite getting the whole point of two separate physical nics.
>   Is it for completeness and generality of the picture ?

Correct. It is entirely to outline the more difficult case of multiple
physical NICs.

>   I think typical hypervisor will likely have only one multi-port nic, then
>   bonding can be off-loaded within single nic via bonding driver.

Agreed. I would expect that to be the reference architecture.

>   Partial offload scenario doesn't have this issue, since 'flow table'
>   is fed by standard netdev which can be bond-dev and everything else, right?

It is unclear how a virtual LAG device would forward flow table hints.
A particular NIC might take 5 tuples which point to a particular fwd
entry. The state of this offload might be different on individual NICs
forming a bond. Either case, it should be abstracted.

> - number of VFs
>   I believe it's still very limited even in the newest nics, but
>   number of containers will be large.

Agreed.

>   So some lxcs will be using VFs and some will use standard veth?

Likely yes. I could forsee this to be driven by an elephant detection
algorithm or driven by policy.

>   We cannot swap them dynamically based on load, so I'm not sure
>   how VF approach is generically applicable here. For some use cases
>   with demanding lxcs, it probably helps, but is it worth the gains?

TBH, I don't know. I'm trying to figure that out ;-) It is obvious that
dedicating individual cores to PMDs is not ideal either for lxc type
workloads. The same question also applies to live migration which is
complicated by this and DPDK type setups. However, I believe that a
proper HW offload abstraction API is superior in terms of providing
virtualization abstraction but I'm afraid we won't know for sure until
we've actually tried it ;-)

  reply	other threads:[~2014-09-15 12:44 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-03  9:24 [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Jiri Pirko
2014-09-03  9:24 ` [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones Jiri Pirko
     [not found]   ` <1409736300-12303-2-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 15:20     ` John Fastabend
     [not found]       ` <540731B9.4010603-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-03 18:42         ` Pravin Shelar
     [not found]           ` <CALnjE+rk26Om1O5_Q=8tn7eAyh4Ywen-1+UD_nCVj_geZY1HuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-04 12:25             ` Jiri Pirko
2014-09-04 12:09         ` Jiri Pirko
2014-09-03 21:11       ` Jamal Hadi Salim
2014-09-03 18:41   ` Pravin Shelar
2014-09-03 21:22     ` Jamal Hadi Salim
     [not found]       ` <54078694.5040104-jkUAjuhPggJWk0Htik3J/w@public.gmane.org>
2014-09-03 21:59         ` Pravin Shelar
     [not found]           ` <CALnjE+qUqSK7kHSi5BZuA0hzFjMcZ8TCTd9JRG1PPmMfDmAQOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-04  1:54             ` Jamal Hadi Salim
     [not found]     ` <CALnjE+pscRmfhaWgkWCunJfjvG04RiNUAj6nefSFHrknQTC+xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-04 12:33       ` Jiri Pirko
     [not found]         ` <20140904123323.GF1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
2014-09-04 20:46           ` Pravin Shelar
2014-09-17  8:34             ` Jiri Pirko
2014-09-17 22:07               ` Jesse Gross
2014-09-03  9:24 ` [patch net-next 02/13] net: rename netdev_phys_port_id to more generic name Jiri Pirko
     [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03  9:24   ` [patch net-next 03/13] net: introduce generic switch devices support Jiri Pirko
     [not found]     ` <1409736300-12303-4-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 15:46       ` John Fastabend
     [not found]         ` <540737CF.4000402-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-04 12:46           ` Jiri Pirko
2014-09-03  9:24   ` [patch net-next 04/13] rtnl: expose physical switch id for particular device Jiri Pirko
2014-09-03  9:24   ` [patch net-next 05/13] net-sysfs: " Jiri Pirko
2014-09-03  9:24   ` [patch net-next 06/13] net: introduce dummy switch Jiri Pirko
2014-09-03  9:24   ` [patch net-next 07/13] dsa: implement ndo_swdev_get_id Jiri Pirko
     [not found]     ` <1409736300-12303-8-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 23:20       ` Florian Fainelli
     [not found]         ` <5407A25A.8050401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-04 12:47           ` Jiri Pirko
     [not found]             ` <20140904124701.GH1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
2014-09-05  4:43               ` Felix Fietkau
2014-09-05  5:52                 ` Jiri Pirko
2014-09-03  9:24   ` [patch net-next 10/13] openvswitch: add support for datapath hardware offload Jiri Pirko
     [not found]     ` <1409736300-12303-11-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 16:37       ` John Fastabend
     [not found]         ` <540743B4.9080500-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-04 12:48           ` Jiri Pirko
     [not found]             ` <20140904124837.GI1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
2014-09-05  3:59               ` Simon Horman
2014-09-03  9:24   ` [patch net-next 11/13] sw_flow: add misc section to key with in_port_ifindex field Jiri Pirko
2014-09-03  9:24   ` [patch net-next 12/13] rocker: introduce rocker switch driver Jiri Pirko
2014-09-03  9:24 ` [patch net-next 08/13] net: introduce netdev_phys_item_ids_match helper Jiri Pirko
2014-09-03  9:24 ` [patch net-next 09/13] openvswitch: introduce vport_op get_netdev Jiri Pirko
2014-09-03  9:25 ` [patch net-next 13/13] switchdev: introduce Netlink API Jiri Pirko
2014-09-08 13:54 ` [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Thomas Graf
2014-09-09 21:09   ` Alexei Starovoitov
2014-09-15 12:43     ` Thomas Graf [this message]
2014-09-16 15:58   ` Jiri Pirko
     [not found]     ` <20140916155832.GA1869-6KJVSR23iU488b5SBfVpbw@public.gmane.org>
2015-06-29  5:44       ` Neelakantam Gaddam
     [not found]         ` <CAOv37=BNU1-+kgTR6RUqxw7snJL6=5g-rLYhuPc1F-V0B1k7tA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-29  5:46           ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140915124358.GA21541@casper.infradead.org \
    --to=tgraf@suug.ch \
    --cc=Neil.Jerram@metaswitch.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andy@greyhouse.net \
    --cc=aviadr@mellanox.com \
    --cc=azhou@nicira.com \
    --cc=ben@decadent.org.uk \
    --cc=buytenh@wantstofly.org \
    --cc=davem@davemloft.net \
    --cc=dborkman@redhat.com \
    --cc=dev@openvswitch.org \
    --cc=ebiederm@xmission.com \
    --cc=edumazet@google.com \
    --cc=f.fainelli@gmail.com \
    --cc=jasowang@redhat.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=jesse@nicira.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.r.fastabend@intel.com \
    --cc=linville@tuxdriver.com \
    --cc=nbd@openwrt.org \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=nicolas.dichtel@6wind.com \
    --cc=ogerlitz@mellanox.com \
    --cc=pshelar@nicira.com \
    --cc=ronye@mellanox.com \
    --cc=roopa@cumulusnetworks.com \
    --cc=ryazanov.s.a@gmail.com \
    --cc=sfeldma@cumulusnetworks.com \
    --cc=stephen@networkplumber.org \
    --cc=vyasevic@redhat.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).