[Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
       [not found] ` <4B5F54E8.3080507@codemonkey.ws>
@ 2010-01-26 23:15   ` Sridhar Samudrala
  2010-01-27  0:06     ` Anthony Liguori
  0 siblings, 1 reply; 5+ messages in thread
From: Sridhar Samudrala @ 2010-01-26 23:15 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: markmc, kvm, Michael S. Tsirkin, qemu-devel, ogerlitz, avi

On Tue, 2010-01-26 at 14:47 -0600, Anthony Liguori wrote:
> On 01/26/2010 02:40 PM, Sridhar Samudrala wrote:
> > This patch adds raw socket backend to qemu and is based on Or Gerlitz's
> > patch re-factored and ported to the latest qemu-kvm git tree.
> > It also includes support for vnet_hdr option that enables gso/checksum
> > offload with raw backend. You can find the linux kernel patch to support
> > this feature here.
> >     http://thread.gmane.org/gmane.linux.network/150308
> >
> > Signed-off-by: Sridhar Samudrala<sri@us.ibm.com>
> >    
> 
> See the previous discussion about the raw backend from Or's original 
> patch.  There's no obvious reason why we should have this in addition to 
> a tun/tap backend.
> 
> The only use-case I know of is macvlan but macvtap addresses this 
> functionality while not introduce the rather nasty security problems 
> associated with a raw backend.

The raw backend can be attached to a physical device, macvlan or SR-IOV VF.
I don't think AF_PACKET socket itself introduces any security problems. The
raw socket can be created only by a user with CAP_RAW capability. The only
issue is if we need to assume that qemu itself is an untrusted process and a
raw fd cannot be passed to it.
But, i think it is a useful backend to support in qemu that provides guest to
remote host connectivity without the need for a bridge/tap.

macvtap could be an alternative if it supports binding to SR-IOV VFs too.

Thanks
Sridhar

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
  2010-01-26 23:15   ` [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu Sridhar Samudrala
@ 2010-01-27  0:06     ` Anthony Liguori
  2010-01-27  6:52       ` Arnd Bergmann
  0 siblings, 1 reply; 5+ messages in thread
From: Anthony Liguori @ 2010-01-27  0:06 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: markmc, kvm, Michael S. Tsirkin, qemu-devel, ogerlitz, avi

On 01/26/2010 05:15 PM, Sridhar Samudrala wrote:
> On Tue, 2010-01-26 at 14:47 -0600, Anthony Liguori wrote:
>    
>> On 01/26/2010 02:40 PM, Sridhar Samudrala wrote:
>>      
>>> This patch adds raw socket backend to qemu and is based on Or Gerlitz's
>>> patch re-factored and ported to the latest qemu-kvm git tree.
>>> It also includes support for vnet_hdr option that enables gso/checksum
>>> offload with raw backend. You can find the linux kernel patch to support
>>> this feature here.
>>>      http://thread.gmane.org/gmane.linux.network/150308
>>>
>>> Signed-off-by: Sridhar Samudrala<sri@us.ibm.com>
>>>
>>>        
>> See the previous discussion about the raw backend from Or's original
>> patch.  There's no obvious reason why we should have this in addition to
>> a tun/tap backend.
>>
>> The only use-case I know of is macvlan but macvtap addresses this
>> functionality while not introduce the rather nasty security problems
>> associated with a raw backend.
>>      
> The raw backend can be attached to a physical device

This is equivalent to bridging with tun/tap except that it has the 
unexpected behaviour of unreliable host/guest networking (which is not 
universally consistent across platforms either).  This is not a mode we 
want to encourage users to use.

> , macvlan

macvtap is a superior way to achieve this use case because a macvtap fd 
can safely be given to a lesser privilege process without allowing 
escalation of privileges.

>   or SR-IOV VF.
>    

This depends on vhost-net.  In general, what I would like to see for 
this is something more user friendly that dealt specifically with this 
use-case.  Although honestly, given the recent security concerns around 
raw sockets, I'm very concerned about supporting raw sockets in qemu at all.

Essentially, you get worse security doing vhost-net + raw + VF then with 
PCI passthrough + VF because at least in the later case you can run qemu 
without privileges.  CAP_NET_RAW is a very big privilege.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
  2010-01-27  0:06     ` Anthony Liguori
@ 2010-01-27  6:52       ` Arnd Bergmann
  2010-01-27 14:14         ` Anthony Liguori
  0 siblings, 1 reply; 5+ messages in thread
From: Arnd Bergmann @ 2010-01-27  6:52 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: markmc, kvm, Michael S. Tsirkin, qemu-devel, ogerlitz, avi,
	Sridhar Samudrala

On Wednesday 27 January 2010, Anthony Liguori wrote:
> > The raw backend can be attached to a physical device
> 
> This is equivalent to bridging with tun/tap except that it has the 
> unexpected behaviour of unreliable host/guest networking (which is not 
> universally consistent across platforms either).  This is not a mode we 
> want to encourage users to use.

It's not the most common scenario, but I've seen systems (I remember
one on s/390 with z/VM) where you really want to isolate the guest
network as much as possible from the host network. Besides PCI
passthrough, giving the host device to a guest using a raw socket
is the next best approximation of that.

Then again, macvtap will do that too, if the device driver supports
multiple unicast MAC addresses without forcing promiscous mode.

> > , macvlan
> 
> macvtap is a superior way to achieve this use case because a macvtap fd 
> can safely be given to a lesser privilege process without allowing 
> escalation of privileges.

Yes.

> >   or SR-IOV VF.
> >    
> 
> This depends on vhost-net.

Why? I don't see anything in this scenario that is vhost-net specific.
I also plan to cover this aspect in macvtap in the future, but the current
code does not do it yet. It also requires device driver changes.

>                               In general, what I would like to see for 
> this is something more user friendly that dealt specifically with this 
> use-case.  Although honestly, given the recent security concerns around 
> raw sockets, I'm very concerned about supporting raw sockets in qemu at all.
> 
> Essentially, you get worse security doing vhost-net + raw + VF then with 
> PCI passthrough + VF because at least in the later case you can run qemu 
> without privileges.  CAP_NET_RAW is a very big privilege.

It can be contained to a large degree with network namespaces. When you
run qemu in its own namespace and add the VF to that, CAP_NET_RAW
should ideally have no effect on other parts of the system (except
bugs in the namespace implementation).

	Arnd

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
  2010-01-27  6:52       ` Arnd Bergmann
@ 2010-01-27 14:14         ` Anthony Liguori
  0 siblings, 0 replies; 5+ messages in thread
From: Anthony Liguori @ 2010-01-27 14:14 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: markmc, kvm, Michael S. Tsirkin, qemu-devel, ogerlitz, avi,
	Sridhar Samudrala

On 01/27/2010 12:52 AM, Arnd Bergmann wrote:
> On Wednesday 27 January 2010, Anthony Liguori wrote:
>    
>>> The raw backend can be attached to a physical device
>>>        
>> This is equivalent to bridging with tun/tap except that it has the
>> unexpected behaviour of unreliable host/guest networking (which is not
>> universally consistent across platforms either).  This is not a mode we
>> want to encourage users to use.
>>      
> It's not the most common scenario, but I've seen systems (I remember
> one on s/390 with z/VM) where you really want to isolate the guest
> network as much as possible from the host network. Besides PCI
> passthrough, giving the host device to a guest using a raw socket
> is the next best approximation of that.
>    

But if you care about isolation, it's the worst possible thing to do.  
If a guest breaks into qemu, it's one bind() away from accessing any 
other guests network.

Using a bridge with a single interface on it is much better from an 
isolation perspective.

>>                                In general, what I would like to see for
>> this is something more user friendly that dealt specifically with this
>> use-case.  Although honestly, given the recent security concerns around
>> raw sockets, I'm very concerned about supporting raw sockets in qemu at all.
>>
>> Essentially, you get worse security doing vhost-net + raw + VF then with
>> PCI passthrough + VF because at least in the later case you can run qemu
>> without privileges.  CAP_NET_RAW is a very big privilege.
>>      
> It can be contained to a large degree with network namespaces. When you
> run qemu in its own namespace and add the VF to that, CAP_NET_RAW
> should ideally have no effect on other parts of the system (except
> bugs in the namespace implementation).
>    

That's a pretty big hammer to hit this problem with.  QEMU should not 
require CAP_NET_RAW and so far has been able to avoid it quite 
successfully.  So far, I haven't heard a compelling reason that to use 
raw other than bridging can be complicated to setup.

If we had the equivalent of a raw socket that could be bound to a socket 
and then "locked" such that it could be safely handed to a 
non-privileged process, then it would be a different story.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
       [not found]       ` <1264632990.20320.106.camel@w-sridhar.beaverton.ibm.com>
@ 2010-01-29 20:52         ` Sridhar Samudrala
  0 siblings, 0 replies; 5+ messages in thread
From: Sridhar Samudrala @ 2010-01-29 20:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: markmc, vivk, kvm, Michael S. Tsirkin, qemu-devel, ogerlitz, avi

On Wed, 2010-01-27 at 14:56 -0800, Sridhar Samudrala wrote:
> On Wed, 2010-01-27 at 22:39 +0100, Arnd Bergmann wrote:
> > On Wednesday 27 January 2010, Anthony Liguori wrote:
> > > >> I think -net socket,fd should just be (trivially) extended to work with raw
> > > >> sockets out of the box, with no support for opening it. Then you can have
> > > >> libvirt or some wrapper open a raw socket and a private namespace and just pass it
> > > >> down.
> > > >>      
> > > > That'd work. Anthony?
> > > 
> > > The fundamental problem that I have with all of this is that we should 
> > > not be introducing new network backends that are based around something 
> > > only a developer is going to understand.  If I'm a user and I want to 
> > > use an external switch in VEPA mode, how in the world am I going to know 
> > > that I'm supposed to use the -net raw backend or the -net socket 
> > > backend?  It might as well be the -net butterflies backend as far as a 
> > > user is concerned.
> > 
> > My point is that we already have -net socket,fd and any user that passes
> > an fd into that already knows what he wants to do with it. Making it
> > work with raw sockets is just a natural extension to this, which works
> > on all kernels and (with separate namespaces) is reasonably secure.
> 
> Didn't realize that -net socket is already there and supports TCP and
> UDP sockets. I will look into extending -net socket to support AF_PACKET
> SOCK_RAW type sockets.

OK. Here is a patch that adds AF_PACKET-SOCK_RAW support to -netdev socket
backend. It allows specifying a already opened raw fd or a ifname to which a
raw socket can be bind.

   -netdev socket,fd=X,id=str
   -netdev socket,ifname=<ethX/macvlanX>,id=str

However, i found that struct NetSocketState doesn't include all the State info that
is required to support AF_PACKET Raw sockets. So i had to add NetSocketRawState 
and also couldn't re-use much of the code.

I think -net socket backend is more geared towards AF_INET sockets. Adding support
for a new family of socket doesn't fit nicely with the existing code.
But if this approach is more acceptable than a new -net raw,fd backend, i am fine 
with it.

Thanks
Sridhar

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index eba578a..7d62dd9 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -15,6 +15,7 @@
 #include "net.h"
 #include "net/checksum.h"
 #include "net/tap.h"
+#include "net/socket.h"
 #include "qemu-timer.h"
 #include "virtio-net.h"
 
@@ -133,6 +134,9 @@ static int peer_has_vnet_hdr(VirtIONet *n)
     case NET_CLIENT_TYPE_TAP:
         n->has_vnet_hdr = tap_has_vnet_hdr(n->nic->nc.peer);
         break;
+    case NET_CLIENT_TYPE_SOCKET_RAW:
+        n->has_vnet_hdr = sock_raw_has_vnet_hdr(n->nic->nc.peer);
+        break;
     default:
         return 0;            
     }
@@ -149,6 +153,9 @@ static int peer_has_ufo(VirtIONet *n)
     case NET_CLIENT_TYPE_TAP:
         n->has_ufo = tap_has_ufo(n->nic->nc.peer);
         break;
+    case NET_CLIENT_TYPE_SOCKET_RAW:
+        n->has_ufo = sock_raw_has_ufo(n->nic->nc.peer);
+        break;
     default:
         return 0;            
     }
@@ -165,6 +172,9 @@ static void peer_using_vnet_hdr(VirtIONet *n, int using_vnet_hdr)
     case NET_CLIENT_TYPE_TAP:
         tap_using_vnet_hdr(n->nic->nc.peer, using_vnet_hdr);
         break;
+    case NET_CLIENT_TYPE_SOCKET_RAW:
+        sock_raw_using_vnet_hdr(n->nic->nc.peer, using_vnet_hdr);
+        break;
     default:
         break; 
     }
@@ -180,6 +190,9 @@ static void peer_set_offload(VirtIONet *n, int csum, int tso4, int tso6,
     case NET_CLIENT_TYPE_TAP:
         tap_set_offload(n->nic->nc.peer, csum, tso4, tso6, ecn, ufo);
         break;
+    case NET_CLIENT_TYPE_SOCKET_RAW:
+        sock_raw_set_offload(n->nic->nc.peer, csum, tso4, tso6, ecn, ufo);
+        break;
     default:
         break; 
     }
diff --git a/net.c b/net.c
index 6ef93e6..3d25d64 100644
--- a/net.c
+++ b/net.c
@@ -1002,6 +1002,11 @@ static struct {
                 .type = QEMU_OPT_STRING,
                 .help = "UDP multicast address and port number",
             },
+            {
+                .name = "ifname",
+                .type = QEMU_OPT_STRING,
+                .help = "interface name",
+            },
             { /* end of list */ }
         },
 #ifdef CONFIG_VDE
diff --git a/net.h b/net.h
index 116bb80..74b3e69 100644
--- a/net.h
+++ b/net.h
@@ -34,7 +34,8 @@ typedef enum {
     NET_CLIENT_TYPE_TAP,
     NET_CLIENT_TYPE_SOCKET,
     NET_CLIENT_TYPE_VDE,
-    NET_CLIENT_TYPE_DUMP
+    NET_CLIENT_TYPE_DUMP,
+    NET_CLIENT_TYPE_SOCKET_RAW,
 } net_client_type;
 
 typedef void (NetPoll)(VLANClientState *, bool enable);
diff --git a/net/socket.c b/net/socket.c
index 5533737..56f5bad 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -32,6 +32,327 @@
 #include "qemu_socket.h"
 #include "sysemu.h"
 
+#include <netpacket/packet.h>
+#include <net/ethernet.h>
+#include <net/if.h>
+#include <sys/ioctl.h>
+
+/* Maximum GSO packet size (64k) plus plenty of room for
+ * the ethernet and virtio_net headers
+ */
+#define RAW_BUFSIZE (4096 + 65536)
+
+typedef struct NetSocketRawState {
+    VLANClientState nc;
+    int fd;
+    uint8_t buf[RAW_BUFSIZE];
+    int promisc;
+    unsigned int read_poll:1;
+    unsigned int write_poll:1;
+    unsigned int has_vnet_hdr:1;
+    unsigned int using_vnet_hdr:1;	
+    unsigned int has_ufo:1;
+} NetSocketRawState;
+
+struct virtio_net_hdr
+{
+    uint8_t flags;
+    uint8_t gso_type;
+    uint16_t hdr_len;
+    uint16_t gso_size;
+    uint16_t csum_start;
+    uint16_t csum_offset;
+};
+
+static int sock_raw_can_send(void *opaque);
+static void sock_raw_send(void *opaque);
+static void sock_raw_writable(void *opaque);
+
+static void sock_raw_update_fd_handler(NetSocketRawState *s)
+{
+    qemu_set_fd_handler2(s->fd,
+                         s->read_poll  ? sock_raw_can_send : NULL,
+                         s->read_poll  ? sock_raw_send     : NULL,
+                         s->write_poll ? sock_raw_writable : NULL,
+                         s);
+}
+
+static void sock_raw_read_poll(NetSocketRawState *s, int enable)
+{
+    s->read_poll = !!enable;
+    sock_raw_update_fd_handler(s);
+}
+
+static void sock_raw_write_poll(NetSocketRawState *s, int enable)
+{
+    s->write_poll = !!enable;
+    sock_raw_update_fd_handler(s);
+}
+
+static void sock_raw_writable(void *opaque)
+{
+    NetSocketRawState *s = opaque;
+
+    sock_raw_write_poll(s, 0);
+    qemu_flush_queued_packets(&s->nc);
+}
+
+static ssize_t sock_raw_write_packet(NetSocketRawState *s,
+                                     const struct iovec *iov,
+                                     int iovcnt)
+{
+    ssize_t len;
+
+    do {
+        len = writev(s->fd, iov, iovcnt);
+    } while (len == -1 && errno == EINTR);
+
+    if (len == -1 && errno == EAGAIN) {
+        sock_raw_write_poll(s, 1);
+        return 0;
+    }
+
+    if (len == -1)
+        printf("raw_write_packet: errno:%d\n", errno);
+
+    return len;
+}
+
+static ssize_t sock_raw_receive_iov(VLANClientState *nc,
+                                    const struct iovec *iov,
+                                    int iovcnt)
+{
+    NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+    const struct iovec *iovp = iov;
+    struct iovec iov_copy[iovcnt + 1];
+    struct virtio_net_hdr hdr = { 0, };
+
+    if (s->has_vnet_hdr && !s->using_vnet_hdr) {
+        iov_copy[0].iov_base = &hdr;
+        iov_copy[0].iov_len =  sizeof(hdr);
+        memcpy(&iov_copy[1], iov, iovcnt * sizeof(*iov));
+        iovp = iov_copy;
+        iovcnt++;
+    }
+
+    return sock_raw_write_packet(s, iovp, iovcnt);
+}
+
+static ssize_t sock_raw_receive_raw(VLANClientState *nc, const uint8_t *buf,
+                                    size_t size)
+{
+    NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+    struct iovec iov[2];
+    int iovcnt = 0;
+    struct virtio_net_hdr hdr = { 0, };
+
+    if (s->has_vnet_hdr) {
+        iov[iovcnt].iov_base = &hdr;
+        iov[iovcnt].iov_len  = sizeof(hdr);
+        iovcnt++;
+    }
+
+    iov[iovcnt].iov_base = (char *)buf;
+    iov[iovcnt].iov_len  = size;
+    iovcnt++;
+
+    return sock_raw_write_packet(s, iov, iovcnt);
+}
+
+static ssize_t sock_raw_receive(VLANClientState *nc, const uint8_t *buf,
+                                size_t size)
+{
+    NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+    struct iovec iov[1];
+
+    if (s->has_vnet_hdr && !s->using_vnet_hdr)
+        return sock_raw_receive_raw(nc, buf, size);
+
+    iov[0].iov_base = (char *)buf;
+    iov[0].iov_len  = size;
+
+    return sock_raw_write_packet(s, iov, 1);
+}
+
+static int sock_raw_can_send(void *opaque)
+{
+    NetSocketRawState *s = opaque;
+
+    return qemu_can_send_packet(&s->nc);
+}
+
+ssize_t sock_raw_read_packet(int fd, uint8_t *buf, int maxlen, int flags)
+{
+    int ret;
+
+    ret = recv(fd, buf, maxlen, flags);
+    return ret;
+}
+
+static void sock_raw_send_completed(VLANClientState *nc, ssize_t len)
+{
+    NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+    sock_raw_read_poll(s, 1);
+}
+
+static void sock_raw_send(void *opaque)
+{
+    NetSocketRawState *s = opaque;
+    int size;
+
+    do {
+        uint8_t *buf = s->buf;
+
+        size = sock_raw_read_packet(s->fd, s->buf, sizeof(s->buf), MSG_TRUNC);
+        if (size <= 0)
+            break;
+
+        if (s->has_vnet_hdr && !s->using_vnet_hdr) {
+            buf  += sizeof(struct virtio_net_hdr);
+            size -= sizeof(struct virtio_net_hdr);
+        }
+
+        size = qemu_send_packet_async(&s->nc, buf, size,
+                                      sock_raw_send_completed);
+        if (size == 0)
+            sock_raw_read_poll(s, 0);
+
+    } while (size > 0 && qemu_can_send_packet(&s->nc));
+}
+
+int sock_raw_has_ufo(VLANClientState *nc)
+{
+    NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+    assert(nc->info->type == NET_CLIENT_TYPE_SOCKET_RAW);
+
+    return s->has_ufo;
+}
+
+int sock_raw_has_vnet_hdr(VLANClientState *nc)
+{
+    NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+    assert(nc->info->type == NET_CLIENT_TYPE_SOCKET_RAW);
+
+    return s->has_vnet_hdr;
+}
+
+void sock_raw_using_vnet_hdr(VLANClientState *nc, int using_vnet_hdr)
+{
+    NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+    using_vnet_hdr = using_vnet_hdr != 0;
+
+    assert(nc->info->type == NET_CLIENT_TYPE_SOCKET_RAW);
+    assert(s->has_vnet_hdr == using_vnet_hdr);
+
+    s->using_vnet_hdr = using_vnet_hdr;
+}
+
+void sock_raw_set_offload(VLANClientState *nc, int csum, int tso4,
+                     int tso6, int ecn, int ufo)
+{
+    return;
+}
+
+static void sock_raw_cleanup(VLANClientState *nc)
+{
+    NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+    qemu_purge_queued_packets(nc);
+
+    sock_raw_read_poll(s, 0);
+    sock_raw_write_poll(s, 0);
+    close(s->fd);
+}
+
+int sock_raw_probe_vnet_hdr(int fd)
+{
+    int val, len;
+	
+    len = sizeof(val);
+    if (getsockopt(fd, SOL_PACKET, PACKET_VNET_HDR, &val, (socklen_t *)&len)
+                                                                         == 0) 
+        return 1;	
+	
+    return 0;
+}
+
+static NetClientInfo net_raw_info = {
+    .type = NET_CLIENT_TYPE_SOCKET_RAW,
+    .size = sizeof(NetSocketRawState),
+    .receive = sock_raw_receive,
+    .receive_raw = NULL,
+    .receive_iov = sock_raw_receive_iov,
+    .cleanup = sock_raw_cleanup,
+};
+
+
+static NetSocketRawState *net_socket_fd_init_raw(VLANState *vlan,
+                                                 const char *model,
+                                                 const char *name, int fd)
+{
+    VLANClientState *nc;
+    NetSocketRawState *s;
+
+    nc = qemu_new_net_client(&net_raw_info, vlan, NULL, model, name);
+
+    s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+    s->fd = fd;
+    s->has_vnet_hdr = sock_raw_probe_vnet_hdr(fd);
+    s->using_vnet_hdr = 0;
+    s->has_ufo = 1;
+    sock_raw_read_poll(s, 1);
+
+    return s;
+}
+
+static int net_socket_raw_ifname_init(VLANState *vlan, const char *model,
+				      const char *name, const char *ifname)
+{
+    struct ifreq req;
+    int fd, ret;
+    struct sockaddr_ll lladdr;
+    int val;
+
+    fd = qemu_socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+    if (fd < 0)
+        fprintf(stderr, "packet socket failed\n");
+
+    memset(&req, 0, sizeof(req));
+    strncpy(req.ifr_name, ifname, IFNAMSIZ-1);
+    ret = ioctl(fd, SIOCGIFINDEX, &req);
+    if (ret < 0)
+        fprintf(stderr, "SIOCGIFINDEX failed\n");
+
+    memset(&lladdr, 0, sizeof(lladdr));
+    lladdr.sll_family   = AF_PACKET;
+    lladdr.sll_protocol = htons(ETH_P_ALL);
+    lladdr.sll_ifindex  = req.ifr_ifindex;
+    ret = bind(fd, (const struct sockaddr *)&lladdr, sizeof(lladdr));
+    if (ret < 0)
+        fprintf(stderr, "bind failed\n");
+
+    val = 1;
+    ret=setsockopt(fd, SOL_PACKET, PACKET_VNET_HDR, (const char *)&val,
+                   sizeof(val));
+    if (ret < 0) {
+        fprintf(stderr, "setsockopt(SOL_PACKET, PACKET_VNET_HDR) failed\n");
+    } 
+
+    ret = fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | O_NONBLOCK);
+    if (ret < 0)
+        fprintf(stderr, "fcntl(O_NONBLOCK) set failed\n");
+
+    net_socket_fd_init_raw(vlan, model, name, fd);
+
+    return 0;
+}
+
+
 typedef struct NetSocketState {
     VLANClientState nc;
     int fd;
@@ -337,6 +658,8 @@ static NetSocketState *net_socket_fd_init(VLANState *vlan,
         return net_socket_fd_init_dgram(vlan, model, name, fd, is_connected);
     case SOCK_STREAM:
         return net_socket_fd_init_stream(vlan, model, name, fd, is_connected);
+    case SOCK_RAW:
+        return (struct NetSocketState *)net_socket_fd_init_raw(vlan, model, name, fd);
     default:
         /* who knows ... this could be a eg. a pty, do warn and continue as stream */
         fprintf(stderr, "qemu: warning: socket type=%d for fd=%d is not SOCK_DGRAM or SOCK_STREAM\n", so_type, fd);
@@ -519,6 +842,22 @@ int net_init_socket(QemuOpts *opts,
             close(fd);
             return -1;
         }
+    } else if (qemu_opt_get(opts, "ifname")) {
+        const char *ifname;
+
+        if (qemu_opt_get(opts, "fd") ||
+            qemu_opt_get(opts, "connect") ||
+            qemu_opt_get(opts, "listen") ||
+            qemu_opt_get(opts, "mcast")) {
+            qemu_error("fd=, connect= and mcast= and listen= is invalid with ifname=\n");
+            return -1;
+        }
+
+        ifname = qemu_opt_get(opts, "ifname");
+
+        if (net_socket_raw_ifname_init(vlan, "socket", name, ifname) == -1) {
+            return -1;
+        }
     } else if (qemu_opt_get(opts, "listen")) {
         const char *listen;
 
diff --git a/net/socket.h b/net/socket.h
index ea46f02..cc09866 100644
--- a/net/socket.h
+++ b/net/socket.h
@@ -30,4 +30,13 @@
 int net_init_socket(QemuOpts *opts, Monitor *mon,
                     const char *name, VLANState *vlan);
 
+#define PACKET_VNET_HDR	15
+
+ssize_t sock_raw_read_packet(int fd, uint8_t *buf, int maxlen, int flags);
+int sock_raw_has_ufo(VLANClientState *vc);
+int sock_raw_has_vnet_hdr(VLANClientState *vc);
+void sock_raw_using_vnet_hdr(VLANClientState *vc, int using_vnet_hdr);
+int sock_raw_probe_vnet_hdr(int fd);
+void sock_raw_set_offload(VLANClientState *vc, int csum, int tso4, int tso6, int ecn, int ufo);
+
 #endif /* QEMU_NET_SOCKET_H */

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-01-29 20:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1264538423.24933.144.camel@w-sridhar.beaverton.ibm.com>
     [not found] ` <4B5F54E8.3080507@codemonkey.ws>
2010-01-26 23:15   ` [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu Sridhar Samudrala
2010-01-27  0:06     ` Anthony Liguori
2010-01-27  6:52       ` Arnd Bergmann
2010-01-27 14:14         ` Anthony Liguori
     [not found] ` <20100127094427.GE3476@redhat.com>
     [not found]   ` <4B6047A7.2030408@codemonkey.ws>
     [not found]     ` <201001272239.13383.arnd@arndb.de>
     [not found]       ` <1264632990.20320.106.camel@w-sridhar.beaverton.ibm.com>
2010-01-29 20:52         ` Sridhar Samudrala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).