* [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
[not found] ` <4B5F54E8.3080507@codemonkey.ws>
@ 2010-01-26 23:15 ` Sridhar Samudrala
2010-01-27 0:06 ` Anthony Liguori
0 siblings, 1 reply; 5+ messages in thread
From: Sridhar Samudrala @ 2010-01-26 23:15 UTC (permalink / raw)
To: Anthony Liguori
Cc: markmc, kvm, Michael S. Tsirkin, qemu-devel, ogerlitz, avi
On Tue, 2010-01-26 at 14:47 -0600, Anthony Liguori wrote:
> On 01/26/2010 02:40 PM, Sridhar Samudrala wrote:
> > This patch adds raw socket backend to qemu and is based on Or Gerlitz's
> > patch re-factored and ported to the latest qemu-kvm git tree.
> > It also includes support for vnet_hdr option that enables gso/checksum
> > offload with raw backend. You can find the linux kernel patch to support
> > this feature here.
> > http://thread.gmane.org/gmane.linux.network/150308
> >
> > Signed-off-by: Sridhar Samudrala<sri@us.ibm.com>
> >
>
> See the previous discussion about the raw backend from Or's original
> patch. There's no obvious reason why we should have this in addition to
> a tun/tap backend.
>
> The only use-case I know of is macvlan but macvtap addresses this
> functionality while not introduce the rather nasty security problems
> associated with a raw backend.
The raw backend can be attached to a physical device, macvlan or SR-IOV VF.
I don't think AF_PACKET socket itself introduces any security problems. The
raw socket can be created only by a user with CAP_RAW capability. The only
issue is if we need to assume that qemu itself is an untrusted process and a
raw fd cannot be passed to it.
But, i think it is a useful backend to support in qemu that provides guest to
remote host connectivity without the need for a bridge/tap.
macvtap could be an alternative if it supports binding to SR-IOV VFs too.
Thanks
Sridhar
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
2010-01-26 23:15 ` [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu Sridhar Samudrala
@ 2010-01-27 0:06 ` Anthony Liguori
2010-01-27 6:52 ` Arnd Bergmann
0 siblings, 1 reply; 5+ messages in thread
From: Anthony Liguori @ 2010-01-27 0:06 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: markmc, kvm, Michael S. Tsirkin, qemu-devel, ogerlitz, avi
On 01/26/2010 05:15 PM, Sridhar Samudrala wrote:
> On Tue, 2010-01-26 at 14:47 -0600, Anthony Liguori wrote:
>
>> On 01/26/2010 02:40 PM, Sridhar Samudrala wrote:
>>
>>> This patch adds raw socket backend to qemu and is based on Or Gerlitz's
>>> patch re-factored and ported to the latest qemu-kvm git tree.
>>> It also includes support for vnet_hdr option that enables gso/checksum
>>> offload with raw backend. You can find the linux kernel patch to support
>>> this feature here.
>>> http://thread.gmane.org/gmane.linux.network/150308
>>>
>>> Signed-off-by: Sridhar Samudrala<sri@us.ibm.com>
>>>
>>>
>> See the previous discussion about the raw backend from Or's original
>> patch. There's no obvious reason why we should have this in addition to
>> a tun/tap backend.
>>
>> The only use-case I know of is macvlan but macvtap addresses this
>> functionality while not introduce the rather nasty security problems
>> associated with a raw backend.
>>
> The raw backend can be attached to a physical device
This is equivalent to bridging with tun/tap except that it has the
unexpected behaviour of unreliable host/guest networking (which is not
universally consistent across platforms either). This is not a mode we
want to encourage users to use.
> , macvlan
macvtap is a superior way to achieve this use case because a macvtap fd
can safely be given to a lesser privilege process without allowing
escalation of privileges.
> or SR-IOV VF.
>
This depends on vhost-net. In general, what I would like to see for
this is something more user friendly that dealt specifically with this
use-case. Although honestly, given the recent security concerns around
raw sockets, I'm very concerned about supporting raw sockets in qemu at all.
Essentially, you get worse security doing vhost-net + raw + VF then with
PCI passthrough + VF because at least in the later case you can run qemu
without privileges. CAP_NET_RAW is a very big privilege.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
2010-01-27 0:06 ` Anthony Liguori
@ 2010-01-27 6:52 ` Arnd Bergmann
2010-01-27 14:14 ` Anthony Liguori
0 siblings, 1 reply; 5+ messages in thread
From: Arnd Bergmann @ 2010-01-27 6:52 UTC (permalink / raw)
To: Anthony Liguori
Cc: markmc, kvm, Michael S. Tsirkin, qemu-devel, ogerlitz, avi,
Sridhar Samudrala
On Wednesday 27 January 2010, Anthony Liguori wrote:
> > The raw backend can be attached to a physical device
>
> This is equivalent to bridging with tun/tap except that it has the
> unexpected behaviour of unreliable host/guest networking (which is not
> universally consistent across platforms either). This is not a mode we
> want to encourage users to use.
It's not the most common scenario, but I've seen systems (I remember
one on s/390 with z/VM) where you really want to isolate the guest
network as much as possible from the host network. Besides PCI
passthrough, giving the host device to a guest using a raw socket
is the next best approximation of that.
Then again, macvtap will do that too, if the device driver supports
multiple unicast MAC addresses without forcing promiscous mode.
> > , macvlan
>
> macvtap is a superior way to achieve this use case because a macvtap fd
> can safely be given to a lesser privilege process without allowing
> escalation of privileges.
Yes.
> > or SR-IOV VF.
> >
>
> This depends on vhost-net.
Why? I don't see anything in this scenario that is vhost-net specific.
I also plan to cover this aspect in macvtap in the future, but the current
code does not do it yet. It also requires device driver changes.
> In general, what I would like to see for
> this is something more user friendly that dealt specifically with this
> use-case. Although honestly, given the recent security concerns around
> raw sockets, I'm very concerned about supporting raw sockets in qemu at all.
>
> Essentially, you get worse security doing vhost-net + raw + VF then with
> PCI passthrough + VF because at least in the later case you can run qemu
> without privileges. CAP_NET_RAW is a very big privilege.
It can be contained to a large degree with network namespaces. When you
run qemu in its own namespace and add the VF to that, CAP_NET_RAW
should ideally have no effect on other parts of the system (except
bugs in the namespace implementation).
Arnd
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
2010-01-27 6:52 ` Arnd Bergmann
@ 2010-01-27 14:14 ` Anthony Liguori
0 siblings, 0 replies; 5+ messages in thread
From: Anthony Liguori @ 2010-01-27 14:14 UTC (permalink / raw)
To: Arnd Bergmann
Cc: markmc, kvm, Michael S. Tsirkin, qemu-devel, ogerlitz, avi,
Sridhar Samudrala
On 01/27/2010 12:52 AM, Arnd Bergmann wrote:
> On Wednesday 27 January 2010, Anthony Liguori wrote:
>
>>> The raw backend can be attached to a physical device
>>>
>> This is equivalent to bridging with tun/tap except that it has the
>> unexpected behaviour of unreliable host/guest networking (which is not
>> universally consistent across platforms either). This is not a mode we
>> want to encourage users to use.
>>
> It's not the most common scenario, but I've seen systems (I remember
> one on s/390 with z/VM) where you really want to isolate the guest
> network as much as possible from the host network. Besides PCI
> passthrough, giving the host device to a guest using a raw socket
> is the next best approximation of that.
>
But if you care about isolation, it's the worst possible thing to do.
If a guest breaks into qemu, it's one bind() away from accessing any
other guests network.
Using a bridge with a single interface on it is much better from an
isolation perspective.
>> In general, what I would like to see for
>> this is something more user friendly that dealt specifically with this
>> use-case. Although honestly, given the recent security concerns around
>> raw sockets, I'm very concerned about supporting raw sockets in qemu at all.
>>
>> Essentially, you get worse security doing vhost-net + raw + VF then with
>> PCI passthrough + VF because at least in the later case you can run qemu
>> without privileges. CAP_NET_RAW is a very big privilege.
>>
> It can be contained to a large degree with network namespaces. When you
> run qemu in its own namespace and add the VF to that, CAP_NET_RAW
> should ideally have no effect on other parts of the system (except
> bugs in the namespace implementation).
>
That's a pretty big hammer to hit this problem with. QEMU should not
require CAP_NET_RAW and so far has been able to avoid it quite
successfully. So far, I haven't heard a compelling reason that to use
raw other than bridging can be complicated to setup.
If we had the equivalent of a raw socket that could be bound to a socket
and then "locked" such that it could be safely handed to a
non-privileged process, then it would be a different story.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
[not found] ` <1264632990.20320.106.camel@w-sridhar.beaverton.ibm.com>
@ 2010-01-29 20:52 ` Sridhar Samudrala
0 siblings, 0 replies; 5+ messages in thread
From: Sridhar Samudrala @ 2010-01-29 20:52 UTC (permalink / raw)
To: Arnd Bergmann
Cc: markmc, vivk, kvm, Michael S. Tsirkin, qemu-devel, ogerlitz, avi
On Wed, 2010-01-27 at 14:56 -0800, Sridhar Samudrala wrote:
> On Wed, 2010-01-27 at 22:39 +0100, Arnd Bergmann wrote:
> > On Wednesday 27 January 2010, Anthony Liguori wrote:
> > > >> I think -net socket,fd should just be (trivially) extended to work with raw
> > > >> sockets out of the box, with no support for opening it. Then you can have
> > > >> libvirt or some wrapper open a raw socket and a private namespace and just pass it
> > > >> down.
> > > >>
> > > > That'd work. Anthony?
> > >
> > > The fundamental problem that I have with all of this is that we should
> > > not be introducing new network backends that are based around something
> > > only a developer is going to understand. If I'm a user and I want to
> > > use an external switch in VEPA mode, how in the world am I going to know
> > > that I'm supposed to use the -net raw backend or the -net socket
> > > backend? It might as well be the -net butterflies backend as far as a
> > > user is concerned.
> >
> > My point is that we already have -net socket,fd and any user that passes
> > an fd into that already knows what he wants to do with it. Making it
> > work with raw sockets is just a natural extension to this, which works
> > on all kernels and (with separate namespaces) is reasonably secure.
>
> Didn't realize that -net socket is already there and supports TCP and
> UDP sockets. I will look into extending -net socket to support AF_PACKET
> SOCK_RAW type sockets.
OK. Here is a patch that adds AF_PACKET-SOCK_RAW support to -netdev socket
backend. It allows specifying a already opened raw fd or a ifname to which a
raw socket can be bind.
-netdev socket,fd=X,id=str
-netdev socket,ifname=<ethX/macvlanX>,id=str
However, i found that struct NetSocketState doesn't include all the State info that
is required to support AF_PACKET Raw sockets. So i had to add NetSocketRawState
and also couldn't re-use much of the code.
I think -net socket backend is more geared towards AF_INET sockets. Adding support
for a new family of socket doesn't fit nicely with the existing code.
But if this approach is more acceptable than a new -net raw,fd backend, i am fine
with it.
Thanks
Sridhar
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index eba578a..7d62dd9 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -15,6 +15,7 @@
#include "net.h"
#include "net/checksum.h"
#include "net/tap.h"
+#include "net/socket.h"
#include "qemu-timer.h"
#include "virtio-net.h"
@@ -133,6 +134,9 @@ static int peer_has_vnet_hdr(VirtIONet *n)
case NET_CLIENT_TYPE_TAP:
n->has_vnet_hdr = tap_has_vnet_hdr(n->nic->nc.peer);
break;
+ case NET_CLIENT_TYPE_SOCKET_RAW:
+ n->has_vnet_hdr = sock_raw_has_vnet_hdr(n->nic->nc.peer);
+ break;
default:
return 0;
}
@@ -149,6 +153,9 @@ static int peer_has_ufo(VirtIONet *n)
case NET_CLIENT_TYPE_TAP:
n->has_ufo = tap_has_ufo(n->nic->nc.peer);
break;
+ case NET_CLIENT_TYPE_SOCKET_RAW:
+ n->has_ufo = sock_raw_has_ufo(n->nic->nc.peer);
+ break;
default:
return 0;
}
@@ -165,6 +172,9 @@ static void peer_using_vnet_hdr(VirtIONet *n, int using_vnet_hdr)
case NET_CLIENT_TYPE_TAP:
tap_using_vnet_hdr(n->nic->nc.peer, using_vnet_hdr);
break;
+ case NET_CLIENT_TYPE_SOCKET_RAW:
+ sock_raw_using_vnet_hdr(n->nic->nc.peer, using_vnet_hdr);
+ break;
default:
break;
}
@@ -180,6 +190,9 @@ static void peer_set_offload(VirtIONet *n, int csum, int tso4, int tso6,
case NET_CLIENT_TYPE_TAP:
tap_set_offload(n->nic->nc.peer, csum, tso4, tso6, ecn, ufo);
break;
+ case NET_CLIENT_TYPE_SOCKET_RAW:
+ sock_raw_set_offload(n->nic->nc.peer, csum, tso4, tso6, ecn, ufo);
+ break;
default:
break;
}
diff --git a/net.c b/net.c
index 6ef93e6..3d25d64 100644
--- a/net.c
+++ b/net.c
@@ -1002,6 +1002,11 @@ static struct {
.type = QEMU_OPT_STRING,
.help = "UDP multicast address and port number",
},
+ {
+ .name = "ifname",
+ .type = QEMU_OPT_STRING,
+ .help = "interface name",
+ },
{ /* end of list */ }
},
#ifdef CONFIG_VDE
diff --git a/net.h b/net.h
index 116bb80..74b3e69 100644
--- a/net.h
+++ b/net.h
@@ -34,7 +34,8 @@ typedef enum {
NET_CLIENT_TYPE_TAP,
NET_CLIENT_TYPE_SOCKET,
NET_CLIENT_TYPE_VDE,
- NET_CLIENT_TYPE_DUMP
+ NET_CLIENT_TYPE_DUMP,
+ NET_CLIENT_TYPE_SOCKET_RAW,
} net_client_type;
typedef void (NetPoll)(VLANClientState *, bool enable);
diff --git a/net/socket.c b/net/socket.c
index 5533737..56f5bad 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -32,6 +32,327 @@
#include "qemu_socket.h"
#include "sysemu.h"
+#include <netpacket/packet.h>
+#include <net/ethernet.h>
+#include <net/if.h>
+#include <sys/ioctl.h>
+
+/* Maximum GSO packet size (64k) plus plenty of room for
+ * the ethernet and virtio_net headers
+ */
+#define RAW_BUFSIZE (4096 + 65536)
+
+typedef struct NetSocketRawState {
+ VLANClientState nc;
+ int fd;
+ uint8_t buf[RAW_BUFSIZE];
+ int promisc;
+ unsigned int read_poll:1;
+ unsigned int write_poll:1;
+ unsigned int has_vnet_hdr:1;
+ unsigned int using_vnet_hdr:1;
+ unsigned int has_ufo:1;
+} NetSocketRawState;
+
+struct virtio_net_hdr
+{
+ uint8_t flags;
+ uint8_t gso_type;
+ uint16_t hdr_len;
+ uint16_t gso_size;
+ uint16_t csum_start;
+ uint16_t csum_offset;
+};
+
+static int sock_raw_can_send(void *opaque);
+static void sock_raw_send(void *opaque);
+static void sock_raw_writable(void *opaque);
+
+static void sock_raw_update_fd_handler(NetSocketRawState *s)
+{
+ qemu_set_fd_handler2(s->fd,
+ s->read_poll ? sock_raw_can_send : NULL,
+ s->read_poll ? sock_raw_send : NULL,
+ s->write_poll ? sock_raw_writable : NULL,
+ s);
+}
+
+static void sock_raw_read_poll(NetSocketRawState *s, int enable)
+{
+ s->read_poll = !!enable;
+ sock_raw_update_fd_handler(s);
+}
+
+static void sock_raw_write_poll(NetSocketRawState *s, int enable)
+{
+ s->write_poll = !!enable;
+ sock_raw_update_fd_handler(s);
+}
+
+static void sock_raw_writable(void *opaque)
+{
+ NetSocketRawState *s = opaque;
+
+ sock_raw_write_poll(s, 0);
+ qemu_flush_queued_packets(&s->nc);
+}
+
+static ssize_t sock_raw_write_packet(NetSocketRawState *s,
+ const struct iovec *iov,
+ int iovcnt)
+{
+ ssize_t len;
+
+ do {
+ len = writev(s->fd, iov, iovcnt);
+ } while (len == -1 && errno == EINTR);
+
+ if (len == -1 && errno == EAGAIN) {
+ sock_raw_write_poll(s, 1);
+ return 0;
+ }
+
+ if (len == -1)
+ printf("raw_write_packet: errno:%d\n", errno);
+
+ return len;
+}
+
+static ssize_t sock_raw_receive_iov(VLANClientState *nc,
+ const struct iovec *iov,
+ int iovcnt)
+{
+ NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+ const struct iovec *iovp = iov;
+ struct iovec iov_copy[iovcnt + 1];
+ struct virtio_net_hdr hdr = { 0, };
+
+ if (s->has_vnet_hdr && !s->using_vnet_hdr) {
+ iov_copy[0].iov_base = &hdr;
+ iov_copy[0].iov_len = sizeof(hdr);
+ memcpy(&iov_copy[1], iov, iovcnt * sizeof(*iov));
+ iovp = iov_copy;
+ iovcnt++;
+ }
+
+ return sock_raw_write_packet(s, iovp, iovcnt);
+}
+
+static ssize_t sock_raw_receive_raw(VLANClientState *nc, const uint8_t *buf,
+ size_t size)
+{
+ NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+ struct iovec iov[2];
+ int iovcnt = 0;
+ struct virtio_net_hdr hdr = { 0, };
+
+ if (s->has_vnet_hdr) {
+ iov[iovcnt].iov_base = &hdr;
+ iov[iovcnt].iov_len = sizeof(hdr);
+ iovcnt++;
+ }
+
+ iov[iovcnt].iov_base = (char *)buf;
+ iov[iovcnt].iov_len = size;
+ iovcnt++;
+
+ return sock_raw_write_packet(s, iov, iovcnt);
+}
+
+static ssize_t sock_raw_receive(VLANClientState *nc, const uint8_t *buf,
+ size_t size)
+{
+ NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+ struct iovec iov[1];
+
+ if (s->has_vnet_hdr && !s->using_vnet_hdr)
+ return sock_raw_receive_raw(nc, buf, size);
+
+ iov[0].iov_base = (char *)buf;
+ iov[0].iov_len = size;
+
+ return sock_raw_write_packet(s, iov, 1);
+}
+
+static int sock_raw_can_send(void *opaque)
+{
+ NetSocketRawState *s = opaque;
+
+ return qemu_can_send_packet(&s->nc);
+}
+
+ssize_t sock_raw_read_packet(int fd, uint8_t *buf, int maxlen, int flags)
+{
+ int ret;
+
+ ret = recv(fd, buf, maxlen, flags);
+ return ret;
+}
+
+static void sock_raw_send_completed(VLANClientState *nc, ssize_t len)
+{
+ NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+ sock_raw_read_poll(s, 1);
+}
+
+static void sock_raw_send(void *opaque)
+{
+ NetSocketRawState *s = opaque;
+ int size;
+
+ do {
+ uint8_t *buf = s->buf;
+
+ size = sock_raw_read_packet(s->fd, s->buf, sizeof(s->buf), MSG_TRUNC);
+ if (size <= 0)
+ break;
+
+ if (s->has_vnet_hdr && !s->using_vnet_hdr) {
+ buf += sizeof(struct virtio_net_hdr);
+ size -= sizeof(struct virtio_net_hdr);
+ }
+
+ size = qemu_send_packet_async(&s->nc, buf, size,
+ sock_raw_send_completed);
+ if (size == 0)
+ sock_raw_read_poll(s, 0);
+
+ } while (size > 0 && qemu_can_send_packet(&s->nc));
+}
+
+int sock_raw_has_ufo(VLANClientState *nc)
+{
+ NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+ assert(nc->info->type == NET_CLIENT_TYPE_SOCKET_RAW);
+
+ return s->has_ufo;
+}
+
+int sock_raw_has_vnet_hdr(VLANClientState *nc)
+{
+ NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+ assert(nc->info->type == NET_CLIENT_TYPE_SOCKET_RAW);
+
+ return s->has_vnet_hdr;
+}
+
+void sock_raw_using_vnet_hdr(VLANClientState *nc, int using_vnet_hdr)
+{
+ NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+ using_vnet_hdr = using_vnet_hdr != 0;
+
+ assert(nc->info->type == NET_CLIENT_TYPE_SOCKET_RAW);
+ assert(s->has_vnet_hdr == using_vnet_hdr);
+
+ s->using_vnet_hdr = using_vnet_hdr;
+}
+
+void sock_raw_set_offload(VLANClientState *nc, int csum, int tso4,
+ int tso6, int ecn, int ufo)
+{
+ return;
+}
+
+static void sock_raw_cleanup(VLANClientState *nc)
+{
+ NetSocketRawState *s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+ qemu_purge_queued_packets(nc);
+
+ sock_raw_read_poll(s, 0);
+ sock_raw_write_poll(s, 0);
+ close(s->fd);
+}
+
+int sock_raw_probe_vnet_hdr(int fd)
+{
+ int val, len;
+
+ len = sizeof(val);
+ if (getsockopt(fd, SOL_PACKET, PACKET_VNET_HDR, &val, (socklen_t *)&len)
+ == 0)
+ return 1;
+
+ return 0;
+}
+
+static NetClientInfo net_raw_info = {
+ .type = NET_CLIENT_TYPE_SOCKET_RAW,
+ .size = sizeof(NetSocketRawState),
+ .receive = sock_raw_receive,
+ .receive_raw = NULL,
+ .receive_iov = sock_raw_receive_iov,
+ .cleanup = sock_raw_cleanup,
+};
+
+
+static NetSocketRawState *net_socket_fd_init_raw(VLANState *vlan,
+ const char *model,
+ const char *name, int fd)
+{
+ VLANClientState *nc;
+ NetSocketRawState *s;
+
+ nc = qemu_new_net_client(&net_raw_info, vlan, NULL, model, name);
+
+ s = DO_UPCAST(NetSocketRawState, nc, nc);
+
+ s->fd = fd;
+ s->has_vnet_hdr = sock_raw_probe_vnet_hdr(fd);
+ s->using_vnet_hdr = 0;
+ s->has_ufo = 1;
+ sock_raw_read_poll(s, 1);
+
+ return s;
+}
+
+static int net_socket_raw_ifname_init(VLANState *vlan, const char *model,
+ const char *name, const char *ifname)
+{
+ struct ifreq req;
+ int fd, ret;
+ struct sockaddr_ll lladdr;
+ int val;
+
+ fd = qemu_socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+ if (fd < 0)
+ fprintf(stderr, "packet socket failed\n");
+
+ memset(&req, 0, sizeof(req));
+ strncpy(req.ifr_name, ifname, IFNAMSIZ-1);
+ ret = ioctl(fd, SIOCGIFINDEX, &req);
+ if (ret < 0)
+ fprintf(stderr, "SIOCGIFINDEX failed\n");
+
+ memset(&lladdr, 0, sizeof(lladdr));
+ lladdr.sll_family = AF_PACKET;
+ lladdr.sll_protocol = htons(ETH_P_ALL);
+ lladdr.sll_ifindex = req.ifr_ifindex;
+ ret = bind(fd, (const struct sockaddr *)&lladdr, sizeof(lladdr));
+ if (ret < 0)
+ fprintf(stderr, "bind failed\n");
+
+ val = 1;
+ ret=setsockopt(fd, SOL_PACKET, PACKET_VNET_HDR, (const char *)&val,
+ sizeof(val));
+ if (ret < 0) {
+ fprintf(stderr, "setsockopt(SOL_PACKET, PACKET_VNET_HDR) failed\n");
+ }
+
+ ret = fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | O_NONBLOCK);
+ if (ret < 0)
+ fprintf(stderr, "fcntl(O_NONBLOCK) set failed\n");
+
+ net_socket_fd_init_raw(vlan, model, name, fd);
+
+ return 0;
+}
+
+
typedef struct NetSocketState {
VLANClientState nc;
int fd;
@@ -337,6 +658,8 @@ static NetSocketState *net_socket_fd_init(VLANState *vlan,
return net_socket_fd_init_dgram(vlan, model, name, fd, is_connected);
case SOCK_STREAM:
return net_socket_fd_init_stream(vlan, model, name, fd, is_connected);
+ case SOCK_RAW:
+ return (struct NetSocketState *)net_socket_fd_init_raw(vlan, model, name, fd);
default:
/* who knows ... this could be a eg. a pty, do warn and continue as stream */
fprintf(stderr, "qemu: warning: socket type=%d for fd=%d is not SOCK_DGRAM or SOCK_STREAM\n", so_type, fd);
@@ -519,6 +842,22 @@ int net_init_socket(QemuOpts *opts,
close(fd);
return -1;
}
+ } else if (qemu_opt_get(opts, "ifname")) {
+ const char *ifname;
+
+ if (qemu_opt_get(opts, "fd") ||
+ qemu_opt_get(opts, "connect") ||
+ qemu_opt_get(opts, "listen") ||
+ qemu_opt_get(opts, "mcast")) {
+ qemu_error("fd=, connect= and mcast= and listen= is invalid with ifname=\n");
+ return -1;
+ }
+
+ ifname = qemu_opt_get(opts, "ifname");
+
+ if (net_socket_raw_ifname_init(vlan, "socket", name, ifname) == -1) {
+ return -1;
+ }
} else if (qemu_opt_get(opts, "listen")) {
const char *listen;
diff --git a/net/socket.h b/net/socket.h
index ea46f02..cc09866 100644
--- a/net/socket.h
+++ b/net/socket.h
@@ -30,4 +30,13 @@
int net_init_socket(QemuOpts *opts, Monitor *mon,
const char *name, VLANState *vlan);
+#define PACKET_VNET_HDR 15
+
+ssize_t sock_raw_read_packet(int fd, uint8_t *buf, int maxlen, int flags);
+int sock_raw_has_ufo(VLANClientState *vc);
+int sock_raw_has_vnet_hdr(VLANClientState *vc);
+void sock_raw_using_vnet_hdr(VLANClientState *vc, int using_vnet_hdr);
+int sock_raw_probe_vnet_hdr(int fd);
+void sock_raw_set_offload(VLANClientState *vc, int csum, int tso4, int tso6, int ecn, int ufo);
+
#endif /* QEMU_NET_SOCKET_H */
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-01-29 20:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1264538423.24933.144.camel@w-sridhar.beaverton.ibm.com>
[not found] ` <4B5F54E8.3080507@codemonkey.ws>
2010-01-26 23:15 ` [Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu Sridhar Samudrala
2010-01-27 0:06 ` Anthony Liguori
2010-01-27 6:52 ` Arnd Bergmann
2010-01-27 14:14 ` Anthony Liguori
[not found] ` <20100127094427.GE3476@redhat.com>
[not found] ` <4B6047A7.2030408@codemonkey.ws>
[not found] ` <201001272239.13383.arnd@arndb.de>
[not found] ` <1264632990.20320.106.camel@w-sridhar.beaverton.ibm.com>
2010-01-29 20:52 ` Sridhar Samudrala
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).