[Qemu-devel] Re: [PATCH-updated] qemu/net: add raw backend

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Anthony Liguori <anthony@codemonkey.ws>
To: Or Gerlitz <ogerlitz@voltaire.com>
Cc: Arnd Bergmann <arndbergmann@googlemail.com>,
	qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>
Subject: [Qemu-devel] Re: [PATCH-updated] qemu/net: add raw backend
Date: Wed, 14 Oct 2009 09:46:33 -0500	[thread overview]
Message-ID: <4AD5E449.9070301@codemonkey.ws> (raw)
In-Reply-To: <20091014143415.GA29937@redhat.com>

Or Gerlitz wrote:
> Add raw network backend option which uses a packet socket to provide
> raw networking access. Once the socket is opened it's bound to a
> provided host interface, such that packets received on the interface
> are delivered to the VM and packets sent by the VM are sent to the
> interface.
>
> This is functionally similar to the existing pcap network
> backend, with the same advantages and problems.
> Differences from pcap:
> - can get an open socket from the monitor,
>   which allows running without NET_ADMIN priviledges
> - support iovec sends with writev, saving one data copy
> - one less dependency on an external library
> - we have access to the underlying file descriptor
>   which makes it possible to connect to vhost net
> - don't support polling all interfaces, always bind to a specific one
>   

Networking is probably the area in qemu that users most frequently 
stumble with.  The most common problems are:

1) slirp does not behave how they think it should (icmp doesn't work, 
guest isn't accessable from host)
2) it's difficult to figure out which backend behaves the way they want 
(socket vs. vde vs. tap)
3) when they figure out they need tap, tap is difficult to setup 

The problem with introducing a raw backend (or a pcap backend) is that 
it makes #2 even worse because now a user has to figure out whether they 
need raw/pcap or tap.  But most troubling, it introduces another issue:

4) raw does not behave how they think it should because guest<->host 
networking does not work bidirectionally

So unless there's an extremely compelling reason to have this, I'd 
rather not introduce this complexity.  NB, I see this as a problem with 
vhost_net too if #4 is also true in that context.

> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/virtio-net.c |    3 +-
>  net.c           |  192 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qemu-options.hx |    4 +
>  3 files changed, 198 insertions(+), 1 deletions(-)
>
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index 1ac05a2..95d9f93 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -545,7 +545,8 @@ static ssize_t virtio_net_receive2(VLANClientState *vc, const uint8_t *buf, size
>              virtqueue_pop(n->rx_vq, &elem) == 0) {
>              if (i == 0)
>                  return -1;
> -            fprintf(stderr, "virtio-net truncating packet\n");
> +            fprintf(stderr, "virtio-net truncating packet. offset %zd size %zd\n",
> +		    offset, size);
>              exit(1);
>          }
>   

This doesn't belong here.
> diff --git a/net.c b/net.c
> index d93eaef..1e0e874 100644
> --- a/net.c
> +++ b/net.c
> @@ -93,6 +93,9 @@
>  #endif
>  #endif
>  
> +#include <netpacket/packet.h>
> +#include <net/ethernet.h>
> +
>   

This is certainly missing guards.

>  #if defined(__OpenBSD__)
>  #include <util.h>
>  #endif
> @@ -1860,6 +1863,158 @@ static TAPState *net_tap_init(VLANState *vlan, const char *model,
>  
>  #endif /* !_WIN32 */
>  
> +typedef struct RAWState {
> +    VLANClientState *vc;
> +    int fd;
> +    uint8_t buf[4096];
> +    int promisc;
> +} RAWState;
> +
> +static int net_raw_fd_init(Monitor *mon, const char *ifname, int promisc)
> +{
> +	int fd, ret;
> +	struct ifreq req;
> +	struct sockaddr_ll lladdr;
> +
> +	fd = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
> +	if (fd < 0)
> +		fprintf(stderr, "packet socket failed\n");
>   

CodingStyle

Also, this error checking should use the monitor error reporting 
framework.  And falling through with fd=-1 certainly means we'll SEGV or 
worse further down.

> +	memset(&req, 0, sizeof(req));
> +	strncpy(req.ifr_name, ifname, IFNAMSIZ-1);
>   

Would be better to just use snprintf

> +	ret = ioctl(fd, SIOCGIFINDEX, &req);
> +	if (ret < 0)
> +		fprintf(stderr, "SIOCGIFINDEX failed\n");
> +
> +	memset(&lladdr, 0, sizeof(lladdr));
> +	lladdr.sll_family   = AF_PACKET;
> +	lladdr.sll_protocol = htons(ETH_P_ALL);
> +	lladdr.sll_ifindex  = req.ifr_ifindex;
> +	ret = bind(fd, (const struct sockaddr *)&lladdr, sizeof(lladdr));
> +	if (ret < 0)
> +		fprintf(stderr, "bind failed\n");
>
>   

Error checking is bad here.

> +	/* set iface to promiscuous mode (packets sent to the VM MAC) */
> +	if (promisc) {
> +		ret = ioctl(fd, SIOCGIFFLAGS, &req);
> +		if (ret < 0)
> +			perror("SIOCGIFFLAGS failed\n");
> +		req.ifr_flags |= IFF_PROMISC;
> +		ret = ioctl(fd, SIOCSIFFLAGS, &req);
> +		if (ret < 0)
> +			fprintf(stderr, "SIOCSIFFLAGS to promiscous failed\n");
> +	}
>   

I suspect these ioctls are Linux specific.

> +	ret = fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | O_NONBLOCK);
> +	if (ret < 0)
> +		fprintf(stderr, "O_NONBLOCK set failed\n");
> +
> +	return fd;
> +}
> +
> +static void raw_cleanup(VLANClientState *vc)
> +{
> +	struct ifreq req;
> +	RAWState *s = vc->opaque;
> +
> +	qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
> +	if (s->promisc) {
> +		ioctl(s->fd, SIOCGIFFLAGS, &req);
> +		req.ifr_flags &= ~IFF_PROMISC;
> +		ioctl(s->fd, SIOCSIFFLAGS, &req);
> +	}
> +	close(s->fd);
> +	qemu_free(s);
> +}
> +
> +static void raw_send(void *opaque);
> +
> +static int raw_can_send(void *opaque)
> +{
> +	RAWState *s = opaque;
> +
> +	return qemu_can_send_packet(s->vc);
> +}
> +
> +static void raw_send_completed(VLANClientState *vc, ssize_t len)
> +{
> +	RAWState *s = vc->opaque;
> +
> +	qemu_set_fd_handler2(s->fd, raw_can_send, raw_send, NULL, s);
> +}
> +
> +static void raw_send(void *opaque)
> +{
> +	RAWState *s = opaque;
> +	int size;
> +
> +	do {
> +		size = recv(s->fd, s->buf, sizeof(s->buf), MSG_TRUNC);
> +		if (size <= 0)
> +			break;
>   

Need to handle EINTR.

> +		size = qemu_send_packet_async(s->vc, s->buf, size,
> +						raw_send_completed);
> +		if (size == 0)
> +			qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
> +
> +	} while (size > 0);
> +}
> +
> +static ssize_t raw_receive_iov(VLANClientState *vc, const struct iovec *iov,
> +				int iovcnt)
> +{
> +	ssize_t len;
> +	RAWState *s = vc->opaque;
> +
> +	do {
> +		len = writev(s->fd, iov, iovcnt);
> +	} while (len == -1 && (errno == EINTR || errno == EAGAIN));
>   

Spinning on EAGAIN is certainly wrong.

> +static int net_raw_init(Monitor *mon, VLANState *vlan, const char *model,
> +			const char *name, const char *ifname,
> +			int promisc, int fd)
> +{
> +	RAWState *s;
> +
> +	s = qemu_mallocz(sizeof(RAWState));
> +
> +	if (fd == -1) {
> +		s->fd = net_raw_fd_init(mon, ifname, promisc);
> +		s->promisc = promisc;
> +	} else
> +		s->fd = fd;
> +
> +	fcntl(s->fd, F_SETFL, O_NONBLOCK);
>   

For net_raw_fd_int, you've already set O_NONBLOCK but you're also 
removing any other flags that have been set which is probably wrong for 
a passed in fd.

> +	s->vc = qemu_new_vlan_client(vlan, model, name, NULL, raw_receive,
> diff --git a/qemu-options.hx b/qemu-options.hx
> index bde3e3f..0d5440f 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -825,6 +825,10 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
>      "                default of 'sndbuf=1048576' can be disabled using 'sndbuf=0'\n"
>  #endif
>  #endif
> +    "-net raw[,vlan=n][,name=str],ifname=name[,promisc=m]\n"
> +    "                bound the host network interface to VLAN 'n' in a raw manner:\n"
> +    "                packets received on the interface are delivered to the vlan and\n"
> +    "                packets delivered on the vlan are sent to the interface\n"
>      "-net socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n"
>      "                connect the vlan 'n' to another VLAN using a socket connection\n"
>      "-net socket[,vlan=n][,name=str][,fd=h][,mcast=maddr:port]\n"
>   

Needs documentation.

Regards,

Anthony Liguori

next prev parent reply	other threads:[~2009-10-14 14:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-14 14:34 [Qemu-devel] [PATCH-updated] qemu/net: add raw backend Or Gerlitz
2009-10-14 14:46 ` Anthony Liguori [this message]
2009-10-14 15:14   ` [Qemu-devel] " Jamie Lokier
2009-10-14 15:58     ` Anthony Liguori
2009-10-14 16:14       ` Michael S. Tsirkin
2009-10-14 16:54         ` Jamie Lokier
2009-10-14 17:20           ` Michael S. Tsirkin
2009-10-14 18:36         ` Anthony Liguori
2009-10-14 19:37           ` Michael S. Tsirkin
2009-10-15  7:48           ` Or Gerlitz
2009-10-14 15:58     ` Michael S. Tsirkin
2009-10-14 15:24   ` Michael S. Tsirkin
2009-10-14 15:33     ` Gleb Natapov
2009-10-15  7:29       ` Or Gerlitz
2009-10-15  7:44         ` Gleb Natapov
2009-10-15  7:50           ` Or Gerlitz
2009-10-14 16:20   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AD5E449.9070301@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=arndbergmann@googlemail.com \
    --cc=mst@redhat.com \
    --cc=ogerlitz@voltaire.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).