From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: Jiri Pirko <jiri@resnulli.us>
Cc: netdev@vger.kernel.org, davem@davemloft.net,
nhorman@tuxdriver.com, andy@greyhouse.net, tgraf@suug.ch,
dborkman@redhat.com, ogerlitz@mellanox.com, jesse@nicira.com,
jpettit@nicira.com, joestringer@nicira.com,
john.r.fastabend@intel.com, jhs@mojatatu.com, sfeldma@gmail.com,
f.fainelli@gmail.com, roopa@cumulusnetworks.com,
linville@tuxdriver.com, simon.horman@netronome.com,
shrijeet@gmail.com, gospo@cumulusnetworks.com, bcrl@kvack.org
Subject: Re: Flows! Offload them.
Date: Thu, 26 Feb 2015 10:42:25 -0500 [thread overview]
Message-ID: <20150226154225.GA5940@oracle.com> (raw)
In-Reply-To: <20150226113942.GC1973@nanopsycho.lan>
>
> Sure. If you look into net/openvswitch/vport-vxlan.c for example, there
> is a socket created by vxlan_sock_add.
> vxlan_rcv is called on rx and vxlan_xmit_skb to xmit.
:
> What I have on mind is to allow to create tunnels using "ip" but not as
> a device but rather just as a wrapper of these functions (and others alike).
Could you elaborate on what the wrapper will look like? will
it be a socket? or something else?
For contextual comparison:
For RDS, the listen side of the TCP socket is created when the
rds_tcp module is initialized. The client side is created when a RDS
packet is sent out In the case of RDS, something similar is achieved
by creating a PF_RDS socket, which can then be used as a datagram socket
(i.e., no need to do connect/accept). In the rds module, what happens is
that the rds_sock gets plumbed up with the underlying kernel TCP socket.
The the fanout per RDS port on the receive side happens via ->sk_data_ready
(in rds_tcp_ready). On the send side, rds_sendmsg sets up the client
socket (if necessary).
All of this is done such that multiple RDS sockets share a single
underlying kernel tcp socket.
But perhaps there is one significant difference for vxlan- vxlan
is encapsulating L2 frames in UDP, so the socket layering model
may not fit so well, except when uspace is creating an entire L2 frame
(which may be fine with ovs/dpdk, I'm not sure what scenarios you
have in mind).
> To identify the instance we name it (OVS has it identified and vport).
not sure I follow the name-space you have in mind here, how is fanout
going to be achieved? (for rds, we determine which endpoint should get
the packet based on the rds sport/dport)
> After that, tc could allow to attach ingress qdisk not only to a device,
> but to this named socket as well. Similary with tc action mirred, it would
> be possible to forward not only to a device, but to this named socket as
> well. All should be very light.
This is the part that I'm interested in.. in the RDS case, the flows
are going to be specified based on the sport/rport in the rds_header,
but as far as the rest of the tcp/ip stack is concerned, the rds_header
is just opaque payload bytes. I realize tc and iptables support that
DPI in theory, and that one can use CLI interfaces to set this up
(I dont know if the system calls used by tc are published as a
stable library to applications?) but I would be interested in
kernel-socket options to set up the tc hooks so that operations on
the RDS socket can be translated into flows and other config
on the shared tcp socket.
> I'm not talking about QoS at all. See the description above.
Understood, but I mentioned qos because tc is typically used to specify
flows for QoS managing algorithms like cbq.
I realize that you are focussed on offloading some of this to h/w,
but you mentioned a "name-based" socket, and tc hooks (for flows in the
inner L2 frame?), and thats the design-detail I'm most interested in..
--Sowmini
next prev parent reply other threads:[~2015-02-26 15:43 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-26 7:42 Flows! Offload them Jiri Pirko
2015-02-26 8:38 ` Simon Horman
2015-02-26 9:16 ` Jiri Pirko
2015-02-26 13:33 ` Thomas Graf
2015-02-26 15:23 ` John Fastabend
2015-02-26 20:16 ` Neil Horman
2015-02-26 21:11 ` John Fastabend
2015-02-27 1:17 ` Neil Horman
2015-02-27 8:53 ` Jiri Pirko
2015-02-27 16:00 ` John Fastabend
2015-02-26 21:52 ` Simon Horman
2015-02-27 1:22 ` Neil Horman
2015-02-27 1:52 ` Tom Herbert
2015-03-02 13:49 ` Andy Gospodarek
2015-03-02 16:54 ` Scott Feldman
2015-03-02 18:06 ` Andy Gospodarek
[not found] ` <CAGpadYEC3-5AdkOG66q0vX+HM0c6EU-C0ZT=sKGe7rZRHsYYKg@mail.gmail.com>
2015-03-02 22:13 ` Scott Feldman
2015-03-02 22:43 ` Andy Gospodarek
2015-03-02 22:49 ` Florian Fainelli
2015-02-27 8:41 ` Thomas Graf
2015-02-27 12:59 ` Neil Horman
2015-03-01 9:36 ` Arad, Ronen
2015-03-01 14:05 ` Neil Horman
2015-03-02 14:16 ` Jamal Hadi Salim
2015-03-01 9:47 ` Arad, Ronen
2015-03-01 17:20 ` Neil Horman
[not found] ` <CAGpadYGrjfkZqe0k7D05+cy3pY=1hXZtQqtV0J-8ogU80K7BUQ@mail.gmail.com>
2015-02-26 15:39 ` John Fastabend
[not found] ` <CAGpadYHfNcDR2ojubkCJ8-nJTQkdLkPsAwJu0wOKU82bLDzhww@mail.gmail.com>
2015-02-26 16:33 ` Thomas Graf
2015-02-26 16:53 ` John Fastabend
2015-02-27 13:33 ` Jamal Hadi Salim
2015-02-27 15:23 ` John Fastabend
2015-03-02 13:45 ` Jamal Hadi Salim
2015-02-26 17:38 ` David Ahern
2015-02-26 16:04 ` Tom Herbert
2015-02-26 16:17 ` Jiri Pirko
2015-02-26 18:15 ` Tom Herbert
2015-02-26 19:05 ` Thomas Graf
2015-02-27 9:00 ` Jiri Pirko
2015-02-28 20:02 ` David Miller
2015-02-28 21:31 ` Jiri Pirko
2015-02-26 18:16 ` Scott Feldman
2015-02-26 11:22 ` Sowmini Varadhan
2015-02-26 11:39 ` Jiri Pirko
2015-02-26 15:42 ` Sowmini Varadhan [this message]
2015-02-27 13:15 ` Named sockets WAS(Re: " Jamal Hadi Salim
2015-02-26 12:51 ` Thomas Graf
2015-02-26 13:17 ` Jiri Pirko
2015-02-26 19:32 ` Florian Fainelli
2015-02-26 20:58 ` John Fastabend
2015-02-26 21:45 ` Florian Fainelli
2015-02-26 23:06 ` John Fastabend
2015-02-27 18:37 ` Neil Horman
2015-02-27 14:01 ` Driver level interface WAS(Re: " Jamal Hadi Salim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150226154225.GA5940@oracle.com \
--to=sowmini.varadhan@oracle.com \
--cc=andy@greyhouse.net \
--cc=bcrl@kvack.org \
--cc=davem@davemloft.net \
--cc=dborkman@redhat.com \
--cc=f.fainelli@gmail.com \
--cc=gospo@cumulusnetworks.com \
--cc=jesse@nicira.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=joestringer@nicira.com \
--cc=john.r.fastabend@intel.com \
--cc=jpettit@nicira.com \
--cc=linville@tuxdriver.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=ogerlitz@mellanox.com \
--cc=roopa@cumulusnetworks.com \
--cc=sfeldma@gmail.com \
--cc=shrijeet@gmail.com \
--cc=simon.horman@netronome.com \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).