netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Open vSwitch Design
@ 2011-11-24 20:10 Jesse Gross
       [not found] ` <CAEP_g=_2L1xFWtDXh_6YyXz1Mt9TR3zvjLzix+SpO6yzeOLsSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 21+ messages in thread
From: Jesse Gross @ 2011-11-24 20:10 UTC (permalink / raw)
  To: netdev, dev
  Cc: David Miller, Stephen Hemminger, Chris Wright, Herbert Xu,
	Eric Dumazet, John Fastabend, Justin Pettit, jhs

I realized that since Open vSwitch is so userspace-centric some of the
design considerations might not be apparent from the kernel code
alone.  I did a poor job of explaining the larger picture which has
lead to some misconceptions, so I thought it would be helpful if I
gave a short overview.

One of the driving goals was to push as much logic as possible to
userspace, so the kernel portion is less than 6000 lines and has four
components:

 * Switching infrastructure:  As the name implies, Open vSwitch is
intended to be a network switch, focused on
virtualization/OpenFlow/software defined networking.  This means that
what we are modeling is not actually a collection of flows but a
switch which contains a group of related ports, a software virtual
device, etc.  The switch model is used in a variety of places, such as
to measure traffic that actually flows through it in order to
implement monitoring and sampling protocols.

 * Flow lookup:  Although used to implement OpenFlow, the kernel flow
table does not actually directly contain OpenFlow flows.  This is
because OpenFlow tables can contain wildcards, multiple pipeline
stages, etc. and we did not want to push that complexity into the
kernel fast path (nor tie it to a specific version of OpenFlow).
Instead an exact match flow table is populated on-demand from
userspace based on the more complex rules stored there.  Although it
might seem limiting, this design has allowed significant new
functionality to be added without modifications to the kernel or
performance impact.

 * Packet execution:  Once a flow is matched it can be output,
enqueued to a particular qdisc, etc.  Some of these operations are
specific to Open vSwitch, such as sampling, whereas others we leverage
existing infrastructure (including tc for QoS) by simply marking the
packet for further processing.

 * Userspace interfaces:  One of the difficulties of having a
specialized, exact match flow lookup engine is maintaining
compatibility across differing kernel/userspace versions.  This
compatibility shows up heavily in the userspace interfaces and is
achieved by passing the kernel's version of the flow along with packet
information.  This allows userspace to install appropriate flows even
if its interpretation of a packet differs from the kernel's without
version checks or maintaining multiple implementations of the flow
extraction code in the kernel.

It's obviously possible to put this code anywhere, whether it is an
independent module, in the bridge, or tc.  Regardless, however, it's
largely new code that is geared towards this particular model so it
seems better not to add to the complexity of existing components if at
all possible.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-11-28 22:42 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-24 20:10 Open vSwitch Design Jesse Gross
     [not found] ` <CAEP_g=_2L1xFWtDXh_6YyXz1Mt9TR3zvjLzix+SpO6yzeOLsSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-11-24 22:30   ` jamal
2011-11-25  5:20     ` Stephen Hemminger
     [not found]       ` <20111124212021.2ae2fb7f-QE31Isp8l5DVJhW05BI4jyWSNWFUUkiGXqFh9Ls21Oc@public.gmane.org>
2011-11-25  6:18         ` Eric Dumazet
2011-11-25  6:25           ` David Miller
     [not found]             ` <20111125.012517.2221372383643417980.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2011-11-25  6:36               ` Eric Dumazet
2011-11-25 11:34                 ` jamal
2011-11-25 13:02                   ` Eric Dumazet
2011-11-28 15:20                     ` [PATCH net-next 0/4] net: factorize flow dissector Eric Dumazet
2011-11-25 20:20                   ` Open vSwitch Design Jesse Gross
     [not found]                     ` <CAEP_g=9tcH9kJrVsHc26kXWZEUS8G-U=U7y6k8xaZG5MD0OTyg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-11-26  1:23                       ` Jamal Hadi Salim
2011-11-25 20:14           ` Jesse Gross
2011-11-25 11:24         ` jamal
2011-11-25 17:28           ` Stephen Hemminger
2011-11-25 17:55         ` Jesse Gross
2011-11-25 19:52         ` Justin Pettit
     [not found]           ` <2DB44B16-598F-4414-8B35-8E322D705A9A-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
2011-11-26  1:11             ` Jamal Hadi Salim
2011-11-26  4:38               ` Stephen Hemminger
     [not found]                 ` <ec23d63d-27c9-4761-bdd3-e3f54bdb5e77-bX68f012229Xuxj3zoTs5AC/G2K4zDHf@public.gmane.org>
2011-11-26  8:05                   ` Martin Casado
2011-11-28 18:34               ` Justin Pettit
2011-11-28 22:42                 ` Jamal Hadi Salim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).