netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/0] Introducing a generic socket offload framework
@ 2011-08-18 22:07 San Mehat
  2011-08-18 22:57 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: San Mehat @ 2011-08-18 22:07 UTC (permalink / raw)
  To: davem, mst, rusty
  Cc: linux-kernel, virtualization, netdev, digitaleric, mikew, miche,
	maccarro


TL;DR
-----
In this RFC we propose the introduction of the concept of hardware socket
offload to the Linux kernel. Patches will accompany this RFC in a few days,
but we felt we had enough on the design to solicit constructive discussion
from the community at-large.

BACKGROUND
----------
Many applications within enterprise organizations suitable for virtualization
neither require nor desire a connection to the full internal Ethernet+IP
network.  Rather, some specific socket connections -- for processing HTTP
requests, making database queries, or interacting with storage -- are needed,
and IP networking in the application may typically be discouraged for
applications that do not sit on the edge of the network. Furthermore, removing
the application's need to understand where its inputs come from / go to within
the networking fabric can make save/restore/migration of a virtualized
application substantially easier - especially in large clusters and on fabrics
which can't handle IP re-assignment.

REQUIREMENTS
------------
 * Allow VM connectivity to internal resources without requiring additional
   network resources (IPs, VLANs, etc).
 * Easy authentication of network streams from a trusted domain (vmm).
 * Protect host-kernel & network-fabric from direct exposure to untrusted
   packet data-structures.
 * Support for multiple distributions of Linux.
 * Minimal third-party software maintenance burden.
 * To be able to co-exist with the existing network stack and ethernet virtual
   devices in the event that an applications specific requirements cannot be
   met by this design.

DESIGN
------
The Berkeley sockets coprocessor is a virtual PCI device which has the ability
to offload socket activity from an unmodified application at the BSD sockets
layer (Layer 4).  Offloaded socket requests bypass the local operating systems
networking stack entirely via the card and are relayed into the VMM
(Virtual Machine Manager) for processing. The VMM then passes the request to a
socket backend for handling. The difference between a socket backend and a
traditional VM ethernet backend is that the socket backend receives layer 4
socket (STREAM/DGRAM) requests instead of a multiplexed stream of layer 2
packets (ethernet) that must be interpreted by the host. This technique also
improves security isolation as the guest is no longer constructing packets which
are evaluated by the host or underlying network fabric; packet construction
happens in the host.

Lastly, pushing socket processing back into the host allows for host-side
control of the network protocols used, which limits the potential congestion
problems that can arise when various guests are using their own congestion
control algorithms.

================================================================================

           +-----------------------------------------------------------------+
           |                                                                 |
  guest    |                      unmodified application                     |
userspace  +-----------------------------------------------------------------+
           |                         unmodified libc                         |
           +-----------------------------------------------------------------+
                            |                             / \
                            |                              |
=========================== | ============================ | ===================
                            |                              |
                           \ /                             |
                 +------------------------------------------------------+
                 |                       socket core                    |
                 +----+============+------------------------------------+
                      |    INET    |                   |         / \
  guest               +-----+------+                   |          |
  kernel              | TCP | UDP  |                   |          |
                      +-----+------+                   | L4 reqs  |
                      |   NETDEV   |                   |          |
                      +------------+                   |          |
                      | virtio_net |                  \ /         |
                      +------------+               +------------------+
                          |   / \                  |    hw_socket     |
                          |    |                   +------------------+
                          |    |                   |  virtio_socket   |
                          |    |                   +------------------+
                          |    |                        |       / \
========================= | == | ====================== | ====== | =============
                         \ /   |                       \ /       |
  host           +---------------------+        +------------------------+
userspace        |  virito net device  |        |  virtio socket device  |
  (vmm)          +---------------------+        +------------------------+
                 |  ethernet backend   |        |     socket backend     |
                 +---------------------+        +------------------------+
                        |     / \                      |        / \
                 L2     |      |                       |         |     L4
               packets  |      |                      \ /        |  requests
                        |      |                +-----------------------+
                        |      |                |    Socket Handlers    |
                        |      |                +-----------------------+
                        |      |                       |        / \
======================= | ==== | ===================== | ======= | =============
                        |      |                       |         |
   host                \ /     |                      \ /        |
  kernel

================================================================================

One of the most appealing aspects of this design (to application developers) is
that this approach can be completely transparent to the application, provided
we're able to intercept the application's socket requests in such a way that we
do not impact performance in a negative fashion, yet retain the API semantics
the application expects. In the event that this design is not suitable for an
application, the virtual machine may be also fitted with a normal virtual
ethernet device in addition to the co-processor (as shown in the diagram above).

Since we wish to allow these paravirtualized sockets to coexist peacefully with
the existing Linux socket system, we've chosen to introduce the idea that a
socket can at some point transition from being managed by the O/S socket system
to a more enlightened 'hardware assisted' socket. The transition is managed by
a 'socket coprocessor' component which intercepts and gets first right of
refusal on handling certain global socket calls (connect, sendto, bind, etc...).
In this initial design, the policy on whether to transition a socket or not is
made by the virtual hardware, although we understand that further measurement
into operation latency is warranted.

In the event the determination is made to transition a socket to hw-assisted
mode, the socket is marked as being assisted by hardware, and all socket
operations are offloaded to hardware.

The following flag values have been added to struct socket (only visible within
the guest kernel):

 * SOCK_HWASSIST
    Indicates socket operations are handled by hardware

In order to support a variety of socket address families, addresses are
converted from their native socket family to an opaque string. Our initial
design formats these strings as URIs. The currently supported conversions are:

+-----------------------------------------------------------------------------+
|   Domain   |      Type     |	URI example conversion                        |
|  AF_INET   |	SOCK_STREAM  |	tcp://x.x.x.x:yyyy                            |
|  AF_INET   |	SOCK_DGRAM   |	udp://x.x.x.x:yyyy                            |
|  AF_INET6  |	SOCK_STREAM  |	tcp6://aaaa:b:cccc:d:eeee:ffff:gggg:hhhh/ii   |
|  AF_INET6  |	SOCK_DGRAM   |	udp6://aaaa:b:cccc:d:eeee:ffff:gggg:hhhh/ii   |
|  AF_IPX    |	SOCK_DGRAM   |	ipx://xxxxxxxx.yyyyyyyyyy.zzzz                |
+-----------------------------------------------------------------------------+

In order for the socket coprocessor to take control of a socket, hooks must be
added to the socket core. Our initial implementation hooks a number of functions
in the socket-core (too many), and after consideration we feel we can reduce it
down considerably by managing the socket 'ops' pointers.

ALTERNATIVE STRATEGIES
----------------------

An alternative strategy for providing similar functionality involves either
modifying glibc or using LD_PRELOAD tricks to intercept socket calls. We were
forced to rule this out due to the complexity (and fragility) involved with
attempting to maintain a general solution compatible accross various
distributions where platform-libraries differ.

CAVEATS
-------

 * We're currently hooked into too many socket calls. We should be able to
   reduce the number of hooks to 3 (__sock_create(), sys_connect(), sys_bind()).

 * Our 'hw_socket' component should be folded into a netdev so we can leverage
   NAPI.

 * We don't handle SOCK_SEQPACKET, SOCK_RAW, SOCK_RDM, or SOCK_PACKET sockets.

 * We don't currently have support for /proc/net. Our current plan is to
   add '/proc/net/hwsock' (filename TBD) and add support for these sockets
   to the net-tools packages (netstat & friends), rather than muck around with
   plumbing hardware-assisted socket info into '/proc/net/tcp' and
   '/proc/net/udp'.

 * We don't currently have SOCK_DGRAM support implemented (work in progress)

 * We have insufficient integration testing in place (work in progress)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/0] Introducing a generic socket offload framework
  2011-08-18 22:07 [RFC 0/0] Introducing a generic socket offload framework San Mehat
@ 2011-08-18 22:57 ` Alan Cox
  2011-08-18 23:03   ` Alan Cox
  2011-08-18 23:18   ` San Mehat
  2011-08-19  3:39 ` David Miller
  2011-08-19 12:49 ` jamal
  2 siblings, 2 replies; 10+ messages in thread
From: Alan Cox @ 2011-08-18 22:57 UTC (permalink / raw)
  To: San Mehat
  Cc: davem, mst, rusty, linux-kernel, virtualization, netdev,
	digitaleric, mikew, miche, maccarro

> The Berkeley sockets coprocessor is a virtual PCI device which has the ability
> to offload socket activity from an unmodified application at the BSD sockets

Ok I think there is an important question here. Why is this being
designed for a specific virtual interface. Unix has always had the notion
that socket operations can be in part generic and that you can pass a
properly designed program a socket without any notion of what it is for.

> Lastly, pushing socket processing back into the host allows for host-side
> control of the network protocols used, which limits the potential congestion
> problems that can arise when various guests are using their own congestion
> control algorithms.

Does that not depend which side does the congestion and who parcels out
buffers ?

> Since we wish to allow these paravirtualized sockets to coexist peacefully with
> the existing Linux socket system, we've chosen to introduce the idea that a
> socket can at some point transition from being managed by the O/S socket system
> to a more enlightened 'hardware assisted' socket. The transition is managed by
> a 'socket coprocessor' component which intercepts and gets first right of
> refusal on handling certain global socket calls (connect, sendto, bind, etc...).
> In this initial design, the policy on whether to transition a socket or not is
> made by the virtual hardware, although we understand that further measurement
> into operation latency is warranted.

Q: whay happens about in process socket syscalls in another thread ?
Thats always been the ugly in these cases either by intercepting or by
swapping file operations on an object.

>  * SOCK_HWASSIST
>     Indicates socket operations are handled by hardware

This guest only view means you can't use the abstraction for local
sockets too.

> In order to support a variety of socket address families, addresses are
> converted from their native socket family to an opaque string. Our initial
> design formats these strings as URIs. The currently supported conversions are:

That makes a lot of sense to me, because its a well understood
abstraction and you can offload other stuff to this kind of generic
socket including things like http protocol acceleration, SSL and so on.

Plus its always been annoying that you can't open a socket, but a URI
interface solves that...

>  * We don't handle SOCK_SEQPACKET, SOCK_RAW, SOCK_RDM, or SOCK_PACKET sockets.

But there is no reason SEQPACKET and RDM couldn't be added I assume?

Ok other questions

Suppose instead you just add an abstracted socket interface of

	AF_SOMETHING, PF_URI

it would be easy to convert programs. It would be easier to write
properly generic programs. It would be easy write some small helpers that
are a good deal less insane than the existing inet ones. At that point
you could turn the problem on its head. Instead of 'borrowing' sockets
for a fairly specific concept of hw assist you ask the reverse question,
who can accelerate this URI be it some kind of virtual machine interface,
something funky like raw data over infiniband, or plain old 'use the
TCP/IP stack'.

Your decision making code is going to be interesting but it only has to
make the decision once in simple cases.

And yes there is still the complicated cases such as 'the routing table
has changed from vitual host to via siberia now what' but I don't believe
your proposal addresses that either.

Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/0] Introducing a generic socket offload framework
  2011-08-18 22:57 ` Alan Cox
@ 2011-08-18 23:03   ` Alan Cox
  2011-08-18 23:18   ` San Mehat
  1 sibling, 0 replies; 10+ messages in thread
From: Alan Cox @ 2011-08-18 23:03 UTC (permalink / raw)
  To: Alan Cox
  Cc: San Mehat, davem, mst, rusty, linux-kernel, virtualization,
	netdev, digitaleric, mikew, miche, maccarro

> Q: whay happens about in process socket syscalls in another thread ?
> Thats always been the ugly in these cases either by intercepting or by
> swapping file operations on an object.

Sorry I meant "in progress" 8)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/0] Introducing a generic socket offload framework
  2011-08-18 22:57 ` Alan Cox
  2011-08-18 23:03   ` Alan Cox
@ 2011-08-18 23:18   ` San Mehat
  2011-08-19  9:28     ` Alan Cox
  1 sibling, 1 reply; 10+ messages in thread
From: San Mehat @ 2011-08-18 23:18 UTC (permalink / raw)
  To: Alan Cox
  Cc: davem, mst, rusty, linux-kernel, virtualization, netdev,
	digitaleric, mikew, miche, maccarro

On Thu, Aug 18, 2011 at 3:57 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> The Berkeley sockets coprocessor is a virtual PCI device which has the ability
>> to offload socket activity from an unmodified application at the BSD sockets
>
> Ok I think there is an important question here. Why is this being
> designed for a specific virtual interface. Unix has always had the notion
> that socket operations can be in part generic and that you can pass a
> properly designed program a socket without any notion of what it is for.

Sorry Alan if I wasn't clear, but I'm not quite sure what you're asking...

If you're asking 'why have you only spec'ed out a virtual interface
for this' then
my answer would be 'but of course you could design this in real hardware and
have a proper driver :)'. If you'd prefer that I call that out
specifically I'm happy to do so.

I have no desire to change the 'genericness' of sockets.. just the
opposite - i wish to
introduce the notion that sockets (can be) completely generic (when
offloaded) as far as
the guest is concerned.

>
>> Lastly, pushing socket processing back into the host allows for host-side
>> control of the network protocols used, which limits the potential congestion
>> problems that can arise when various guests are using their own congestion
>> control algorithms.
>
> Does that not depend which side does the congestion and who parcels out
> buffers ?

It does, and it does.

>
>> Since we wish to allow these paravirtualized sockets to coexist peacefully with
>> the existing Linux socket system, we've chosen to introduce the idea that a
>> socket can at some point transition from being managed by the O/S socket system
>> to a more enlightened 'hardware assisted' socket. The transition is managed by
>> a 'socket coprocessor' component which intercepts and gets first right of
>> refusal on handling certain global socket calls (connect, sendto, bind, etc...).
>> In this initial design, the policy on whether to transition a socket or not is
>> made by the virtual hardware, although we understand that further measurement
>> into operation latency is warranted.
>
> Q: whay happens about in process socket syscalls in another thread ?
> Thats always been the ugly in these cases either by intercepting or by
> swapping file operations on an object.
>
>>  * SOCK_HWASSIST
>>     Indicates socket operations are handled by hardware
>
> This guest only view means you can't use the abstraction for local
> sockets too.
>

To be honest, the way we're attempting to integrate is in such a way
that you *could*
offload AF_LOCAL sockets...  but that world gets a bit too much like
the 'Twilight Zone'
for my current linkings..

>> In order to support a variety of socket address families, addresses are
>> converted from their native socket family to an opaque string. Our initial
>> design formats these strings as URIs. The currently supported conversions are:
>
> That makes a lot of sense to me, because its a well understood
> abstraction and you can offload other stuff to this kind of generic
> socket including things like http protocol acceleration, SSL and so on.
>
> Plus its always been annoying that you can't open a socket, but a URI
> interface solves that...

Indeed.

>
>>  * We don't handle SOCK_SEQPACKET, SOCK_RAW, SOCK_RDM, or SOCK_PACKET sockets.
>
> But there is no reason SEQPACKET and RDM couldn't be added I assume?

No reason I can think of - we just did not have a specific requirement
for it at the time.

>
> Ok other questions
>
> Suppose instead you just add an abstracted socket interface of
>
>        AF_SOMETHING, PF_URI

Mike Waychison and I were saving the 'PF_URI' discussion for a future
date, but indeed
we're on the same wave-length :). Our initial requirements are for an
'extremely minimal
burden of support' on the userspace environments, so we decided to
open up a separate
discussion on PF_URI

>
> it would be easy to convert programs. It would be easier to write
> properly generic programs. It would be easy write some small helpers that
> are a good deal less insane than the existing inet ones. At that point
> you could turn the problem on its head. Instead of 'borrowing' sockets
> for a fairly specific concept of hw assist you ask the reverse question,
> who can accelerate this URI be it some kind of virtual machine interface,
> something funky like raw data over infiniband, or plain old 'use the
> TCP/IP stack'.

Completely agree.

>
> Your decision making code is going to be interesting but it only has to
> make the decision once in simple cases.

Yup.

>
> And yes there is still the complicated cases such as 'the routing table
> has changed from vitual host to via siberia now what' but I don't believe
> your proposal addresses that either.

Can you be more specific? If you mean solving the 'keeping your tcp connections
open to non virtual endpoints across a migration (or whatever)' then
no it doesn't :)

>
> Alan
>

Thanks man,

-san


-- 
San Mehat | Staff Software Engineer | san@google.com | 415-366-6172

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/0] Introducing a generic socket offload framework
  2011-08-18 22:07 [RFC 0/0] Introducing a generic socket offload framework San Mehat
  2011-08-18 22:57 ` Alan Cox
@ 2011-08-19  3:39 ` David Miller
  2011-08-19 12:49 ` jamal
  2 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2011-08-19  3:39 UTC (permalink / raw)
  To: san
  Cc: mst, rusty, linux-kernel, virtualization, netdev, digitaleric,
	mikew, miche, maccarro


I'm not reading any RFC without any example code, sorry.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/0] Introducing a generic socket offload framework
  2011-08-18 23:18   ` San Mehat
@ 2011-08-19  9:28     ` Alan Cox
  0 siblings, 0 replies; 10+ messages in thread
From: Alan Cox @ 2011-08-19  9:28 UTC (permalink / raw)
  To: San Mehat
  Cc: davem, mst, rusty, linux-kernel, virtualization, netdev,
	digitaleric, mikew, miche, maccarro

> I have no desire to change the 'genericness' of sockets.. just the
> opposite - i wish to
> introduce the notion that sockets (can be) completely generic (when
> offloaded) as far as
> the guest is concerned.

I suppose my concern is that you don't want to design for a specific
offload device, your offload might change but the view from the
application side should not differ.

> > This guest only view means you can't use the abstraction for local
> > sockets too.
> >
> 
> To be honest, the way we're attempting to integrate is in such a way
> that you *could*
> offload AF_LOCAL sockets...  but that world gets a bit too much like
> the 'Twilight Zone'
> for my current linkings..

Until you want to be able to have a pair of apps talking that may or may
not be on different systems and may or may not be on a vm host at all, at
which point having the same acceleration between them (a null accelerator
so to speak) would avoid having to add extra paths to the apps.

> > And yes there is still the complicated cases such as 'the routing table
> > has changed from vitual host to via siberia now what' but I don't believe
> > your proposal addresses that either.
> 
> Can you be more specific? If you mean solving the 'keeping your tcp connections
> open to non virtual endpoints across a migration (or whatever)' then
> no it doesn't :)

That was my assumption.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/0] Introducing a generic socket offload framework
  2011-08-18 22:07 [RFC 0/0] Introducing a generic socket offload framework San Mehat
  2011-08-18 22:57 ` Alan Cox
  2011-08-19  3:39 ` David Miller
@ 2011-08-19 12:49 ` jamal
  2011-08-19 14:44   ` San Mehat
  2011-08-19 14:58   ` San Mehat
  2 siblings, 2 replies; 10+ messages in thread
From: jamal @ 2011-08-19 12:49 UTC (permalink / raw)
  To: San Mehat
  Cc: davem, mst, rusty, linux-kernel, virtualization, netdev,
	digitaleric, mikew, miche, maccarro

On Thu, 2011-08-18 at 15:07 -0700, San Mehat wrote:
> TL;DR
> -----
> In this RFC we propose the introduction of the concept of hardware socket
> offload to the Linux kernel. Patches will accompany this RFC in a few days,
> but we felt we had enough on the design to solicit constructive discussion
> from the community at-large.
> 

[..]

> ALTERNATIVE STRATEGIES
> ----------------------
> 
> An alternative strategy for providing similar functionality involves either
> modifying glibc or using LD_PRELOAD tricks to intercept socket calls. We were
> forced to rule this out due to the complexity (and fragility) involved with
> attempting to maintain a general solution compatible across various
> distributions where platform-libraries differ.

Above should have been in your TL;DR;->

LD_PRELOAD is also horrible because of the granularity of the socket
calls;
Having things in the kernel and specifically tagging socket as needing
this feature is much much more manageable.

Tying things to virtualization may miss the big picture because there
are many other use cases for intercepting socket calls, example:
Samir Bellabes <sam@synack.fr> has been trying to get what he refers to
as "personal firewall" (equivalent to the silly windows firewall) which
prompts the user "ping from blah, do you want to allow a response?"
That requires intercepting send/recv calls, prompt the user in possibly
some pop-up and act based on response. It requires looking at content.
He is trying to use selinux for that interface,
but i think this would be the right abstraction.
I hope you plan to support send/recv.
I also hope you add support for SOCK_RAW (and maybe SOCK_PACKET).

Q: If you want this to be transparent to the apps, who/what is doing
the tagging of SOCK_HWASSIST? clearly not the app if you dont want to
change it.

I like the uri abstraction if it doesnt come at the expense of the
app transparency.

cheers,
jamal

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/0] Introducing a generic socket offload framework
  2011-08-19 12:49 ` jamal
@ 2011-08-19 14:44   ` San Mehat
  2011-08-19 14:58   ` San Mehat
  1 sibling, 0 replies; 10+ messages in thread
From: San Mehat @ 2011-08-19 14:44 UTC (permalink / raw)
  To: jhs
  Cc: mst, netdev, miche, linux-kernel, virtualization, digitaleric,
	mikew, maccarro, davem


[-- Attachment #1.1: Type: text/plain, Size: 2605 bytes --]

On Fri, Aug 19, 2011 at 5:49 AM, jamal <hadi@cyberus.ca> wrote:

> On Thu, 2011-08-18 at 15:07 -0700, San Mehat wrote:
> > TL;DR
> > -----
> > In this RFC we propose the introduction of the concept of hardware socket
> > offload to the Linux kernel. Patches will accompany this RFC in a few
> days,
> > but we felt we had enough on the design to solicit constructive
> discussion
> > from the community at-large.
> >
>
> [..]
>
> > ALTERNATIVE STRATEGIES
> > ----------------------
> >
> > An alternative strategy for providing similar functionality involves
> either
> > modifying glibc or using LD_PRELOAD tricks to intercept socket calls. We
> were
> > forced to rule this out due to the complexity (and fragility) involved
> with
> > attempting to maintain a general solution compatible across various
> > distributions where platform-libraries differ.
>
> Above should have been in your TL;DR;->
>
> LD_PRELOAD is also horrible because of the granularity of the socket
> calls;
> Having things in the kernel and specifically tagging socket as needing
> this feature is much much more manageable.
>
> Tying things to virtualization may miss the big picture because there
> are many other use cases for intercepting socket calls, example:
> Samir Bellabes <sam@synack.fr> has been trying to get what he refers to
> as "personal firewall" (equivalent to the silly windows firewall) which
> prompts the user "ping from blah, do you want to allow a response?"
> That requires intercepting send/recv calls, prompt the user in possibly
> some pop-up and act based on response. It requires looking at content.
> He is trying to use selinux for that interface,
> but i think this would be the right abstraction.
>

I agree; there's no reason this needs to be tied to virtualization - it was
just the
driving force behind the design. I will generalize the backend interface
types


> I hope you plan to support send/recv.
>

yes


> I also hope you add support for SOCK_RAW (and maybe SOCK_PACKET).
>
>
 Can you explain a good use-case for SOCK_RAW in this type of environment?
We were noodling it around locally and couldn't come up with one that we
needed to support.

Q: If you want this to be transparent to the apps, who/what is doing
> the tagging of SOCK_HWASSIST? clearly not the app if you dont want to
> change it.
>
>
 The decision of whether to tag a socket or not is made by the 'hardware'

I like the uri abstraction if it doesnt come at the expense of the
> app transparency.
>
>
Thank you,

-san


> cheers,
> jamal
>
>


-- 
San Mehat | Staff Software Engineer | san@google.com | 415-366-6172

[-- Attachment #1.2: Type: text/html, Size: 6533 bytes --]

[-- Attachment #2: Type: text/plain, Size: 184 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/0] Introducing a generic socket offload framework
  2011-08-19 12:49 ` jamal
  2011-08-19 14:44   ` San Mehat
@ 2011-08-19 14:58   ` San Mehat
  2011-08-20 14:32     ` jamal
  1 sibling, 1 reply; 10+ messages in thread
From: San Mehat @ 2011-08-19 14:58 UTC (permalink / raw)
  To: jhs
  Cc: davem, mst, rusty, linux-kernel, virtualization, netdev,
	digitaleric, mikew, miche, maccarro

On Fri, Aug 19, 2011 at 5:49 AM, jamal <hadi@cyberus.ca> wrote:
>
> On Thu, 2011-08-18 at 15:07 -0700, San Mehat wrote:
> > TL;DR
> > -----
> > In this RFC we propose the introduction of the concept of hardware socket
> > offload to the Linux kernel. Patches will accompany this RFC in a few days,
> > but we felt we had enough on the design to solicit constructive discussion
> > from the community at-large.
> >
>
> [..]
>
> > ALTERNATIVE STRATEGIES
> > ----------------------
> >
> > An alternative strategy for providing similar functionality involves either
> > modifying glibc or using LD_PRELOAD tricks to intercept socket calls. We were
> > forced to rule this out due to the complexity (and fragility) involved with
> > attempting to maintain a general solution compatible across various
> > distributions where platform-libraries differ.
>
> Above should have been in your TL;DR;->
>
> LD_PRELOAD is also horrible because of the granularity of the socket
> calls;
> Having things in the kernel and specifically tagging socket as needing
> this feature is much much more manageable.
>
> Tying things to virtualization may miss the big picture because there
> are many other use cases for intercepting socket calls, example:
> Samir Bellabes <sam@synack.fr> has been trying to get what he refers to
> as "personal firewall" (equivalent to the silly windows firewall) which
> prompts the user "ping from blah, do you want to allow a response?"
> That requires intercepting send/recv calls, prompt the user in possibly
> some pop-up and act based on response. It requires looking at content.
> He is trying to use selinux for that interface,
> but i think this would be the right abstraction.

I agree; there's no reason this needs to be tied to virtualization -
it was just the
driving force behind the design. I will generalize the backend interface types

> I hope you plan to support send/recv.

yes

> I also hope you add support for SOCK_RAW (and maybe SOCK_PACKET).

Can you explain a good use-case for SOCK_RAW in this type of
environment? We were noodling it around locally and couldn't come up
with one that we needed to support.

>
> Q: If you want this to be transparent to the apps, who/what is doing
> the tagging of SOCK_HWASSIST? clearly not the app if you dont want to
> change it.

The decision of whether to tag a socket or not is made by the 'hardware'

>
> I like the uri abstraction if it doesnt come at the expense of the
> app transparency.
>

Thank you

-san

> cheers,
> jamal
>



--
San Mehat | Staff Software Engineer | san@google.com | 415-366-6172

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/0] Introducing a generic socket offload framework
  2011-08-19 14:58   ` San Mehat
@ 2011-08-20 14:32     ` jamal
  0 siblings, 0 replies; 10+ messages in thread
From: jamal @ 2011-08-20 14:32 UTC (permalink / raw)
  To: San Mehat
  Cc: davem, mst, rusty, linux-kernel, virtualization, netdev,
	digitaleric, mikew, miche, maccarro

On Fri, 2011-08-19 at 07:58 -0700, San Mehat wrote:

> Can you explain a good use-case for SOCK_RAW in this type of
> environment? We were noodling it around locally and couldn't come up
> with one that we needed to support.

One that comes to mind is the case of Samir's app: youd need to handle
some of the apps that ride on top of IP typically using SOCK_RAW
eg ping, OSPF essentially anything on IP that doesnt have transport
built into kernel etc; 

> > Q: If you want this to be transparent to the apps, who/what is doing
> > the tagging of SOCK_HWASSIST? clearly not the app if you dont want to
> > change it.
> 
> The decision of whether to tag a socket or not is made by the 'hardware'

As in some config interface? 

cheers,
jamal


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-08-20 14:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-18 22:07 [RFC 0/0] Introducing a generic socket offload framework San Mehat
2011-08-18 22:57 ` Alan Cox
2011-08-18 23:03   ` Alan Cox
2011-08-18 23:18   ` San Mehat
2011-08-19  9:28     ` Alan Cox
2011-08-19  3:39 ` David Miller
2011-08-19 12:49 ` jamal
2011-08-19 14:44   ` San Mehat
2011-08-19 14:58   ` San Mehat
2011-08-20 14:32     ` jamal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).