Re: [Qemu-devel] Re: [PATCH-updated] qemu/net: add raw backend

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Gleb Natapov <gleb@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Or Gerlitz <ogerlitz@voltaire.com>,
	Arnd Bergmann <arndbergmann@googlemail.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Re: [PATCH-updated] qemu/net: add raw backend
Date: Wed, 14 Oct 2009 17:33:00 +0200	[thread overview]
Message-ID: <20091014153300.GA30889@redhat.com> (raw)
In-Reply-To: <20091014152451.GA30179@redhat.com>

On Wed, Oct 14, 2009 at 05:24:52PM +0200, Michael S. Tsirkin wrote:
> On Wed, Oct 14, 2009 at 09:46:33AM -0500, Anthony Liguori wrote:
> > Or Gerlitz wrote:
> >> Add raw network backend option which uses a packet socket to provide
> >> raw networking access. Once the socket is opened it's bound to a
> >> provided host interface, such that packets received on the interface
> >> are delivered to the VM and packets sent by the VM are sent to the
> >> interface.
> >>
> >> This is functionally similar to the existing pcap network
> >> backend, with the same advantages and problems.
> >> Differences from pcap:
> >> - can get an open socket from the monitor,
> >>   which allows running without NET_ADMIN priviledges
> >> - support iovec sends with writev, saving one data copy
> >> - one less dependency on an external library
> >> - we have access to the underlying file descriptor
> >>   which makes it possible to connect to vhost net
> >> - don't support polling all interfaces, always bind to a specific one
> >>   
> >
> > Networking is probably the area in qemu that users most frequently  
> > stumble with.  The most common problems are:
> >
> > 1) slirp does not behave how they think it should (icmp doesn't work,  
> > guest isn't accessable from host)
> > 2) it's difficult to figure out which backend behaves the way they want  
> > (socket vs. vde vs. tap)
> > 3) when they figure out they need tap, tap is difficult to setup 
> 
> tap does not need any setup. problem is, bridge needs setup.
> 
tap need IP assigning.

> > The problem with introducing a raw backend (or a pcap backend)
> > is that  
> > it makes #2 even worse because now a user has to figure out whether they  
> > need raw/pcap or tap.  But most troubling, it introduces another issue:
> >
> > 4) raw does not behave how they think it should because guest<->host  
> > networking does not work bidirectionally
> 
> OTOH icmp works fine, so we are not worse off than with slirp.
> 
> > So unless there's an extremely compelling reason to have this,
> 
> I work with remote machines all the time, having to fiddle with host
> bridge/network setup means I always risk losing the only way to admin
> the machine.  So it's slirp or raw for me.  If I'm the only one like
> this, I can keep maintaining this patch, but I doubt it.
> 
You are not the only one, but slirp works fine for my purposes.

> I consider this a compelling reason.
> 
> > I'd rather not introduce this complexity.
> 
> Does another option really add that much complexity?
> We add options all the time ...
> 
> >  NB, I see this as a problem with  
> > vhost_net too if #4 is also true in that context.
> 
> It's up to user whether to connect vhost net to tap or socket.
> I haven't posted userspace code to connect vhost to tap yet
> but I will RSN.
> 
> >> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> ---
> >>  hw/virtio-net.c |    3 +-
> >>  net.c           |  192 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  qemu-options.hx |    4 +
> >>  3 files changed, 198 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> >> index 1ac05a2..95d9f93 100644
> >> --- a/hw/virtio-net.c
> >> +++ b/hw/virtio-net.c
> >> @@ -545,7 +545,8 @@ static ssize_t virtio_net_receive2(VLANClientState *vc, const uint8_t *buf, size
> >>              virtqueue_pop(n->rx_vq, &elem) == 0) {
> >>              if (i == 0)
> >>                  return -1;
> >> -            fprintf(stderr, "virtio-net truncating packet\n");
> >> +            fprintf(stderr, "virtio-net truncating packet. offset %zd size %zd\n",
> >> +		    offset, size);
> >>              exit(1);
> >>          }
> >>   
> >
> > This doesn't belong here.
> >> diff --git a/net.c b/net.c
> >> index d93eaef..1e0e874 100644
> >> --- a/net.c
> >> +++ b/net.c
> >> @@ -93,6 +93,9 @@
> >>  #endif
> >>  #endif
> >>  +#include <netpacket/packet.h>
> >> +#include <net/ethernet.h>
> >> +
> >>   
> >
> > This is certainly missing guards.
> >
> >>  #if defined(__OpenBSD__)
> >>  #include <util.h>
> >>  #endif
> >> @@ -1860,6 +1863,158 @@ static TAPState *net_tap_init(VLANState *vlan, const char *model,
> >>   #endif /* !_WIN32 */
> >>  +typedef struct RAWState {
> >> +    VLANClientState *vc;
> >> +    int fd;
> >> +    uint8_t buf[4096];
> >> +    int promisc;
> >> +} RAWState;
> >> +
> >> +static int net_raw_fd_init(Monitor *mon, const char *ifname, int promisc)
> >> +{
> >> +	int fd, ret;
> >> +	struct ifreq req;
> >> +	struct sockaddr_ll lladdr;
> >> +
> >> +	fd = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
> >> +	if (fd < 0)
> >> +		fprintf(stderr, "packet socket failed\n");
> >>   
> >
> > CodingStyle
> >
> > Also, this error checking should use the monitor error reporting  
> > framework.  And falling through with fd=-1 certainly means we'll SEGV or  
> > worse further down.
> >
> >> +	memset(&req, 0, sizeof(req));
> >> +	strncpy(req.ifr_name, ifname, IFNAMSIZ-1);
> >>   
> >
> > Would be better to just use snprintf
> >
> >> +	ret = ioctl(fd, SIOCGIFINDEX, &req);
> >> +	if (ret < 0)
> >> +		fprintf(stderr, "SIOCGIFINDEX failed\n");
> >> +
> >> +	memset(&lladdr, 0, sizeof(lladdr));
> >> +	lladdr.sll_family   = AF_PACKET;
> >> +	lladdr.sll_protocol = htons(ETH_P_ALL);
> >> +	lladdr.sll_ifindex  = req.ifr_ifindex;
> >> +	ret = bind(fd, (const struct sockaddr *)&lladdr, sizeof(lladdr));
> >> +	if (ret < 0)
> >> +		fprintf(stderr, "bind failed\n");
> >>
> >>   
> >
> > Error checking is bad here.
> >
> >> +	/* set iface to promiscuous mode (packets sent to the VM MAC) */
> >> +	if (promisc) {
> >> +		ret = ioctl(fd, SIOCGIFFLAGS, &req);
> >> +		if (ret < 0)
> >> +			perror("SIOCGIFFLAGS failed\n");
> >> +		req.ifr_flags |= IFF_PROMISC;
> >> +		ret = ioctl(fd, SIOCSIFFLAGS, &req);
> >> +		if (ret < 0)
> >> +			fprintf(stderr, "SIOCSIFFLAGS to promiscous failed\n");
> >> +	}
> >>   
> >
> > I suspect these ioctls are Linux specific.
> >
> >> +	ret = fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | O_NONBLOCK);
> >> +	if (ret < 0)
> >> +		fprintf(stderr, "O_NONBLOCK set failed\n");
> >> +
> >> +	return fd;
> >> +}
> >> +
> >> +static void raw_cleanup(VLANClientState *vc)
> >> +{
> >> +	struct ifreq req;
> >> +	RAWState *s = vc->opaque;
> >> +
> >> +	qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
> >> +	if (s->promisc) {
> >> +		ioctl(s->fd, SIOCGIFFLAGS, &req);
> >> +		req.ifr_flags &= ~IFF_PROMISC;
> >> +		ioctl(s->fd, SIOCSIFFLAGS, &req);
> >> +	}
> >> +	close(s->fd);
> >> +	qemu_free(s);
> >> +}
> >> +
> >> +static void raw_send(void *opaque);
> >> +
> >> +static int raw_can_send(void *opaque)
> >> +{
> >> +	RAWState *s = opaque;
> >> +
> >> +	return qemu_can_send_packet(s->vc);
> >> +}
> >> +
> >> +static void raw_send_completed(VLANClientState *vc, ssize_t len)
> >> +{
> >> +	RAWState *s = vc->opaque;
> >> +
> >> +	qemu_set_fd_handler2(s->fd, raw_can_send, raw_send, NULL, s);
> >> +}
> >> +
> >> +static void raw_send(void *opaque)
> >> +{
> >> +	RAWState *s = opaque;
> >> +	int size;
> >> +
> >> +	do {
> >> +		size = recv(s->fd, s->buf, sizeof(s->buf), MSG_TRUNC);
> >> +		if (size <= 0)
> >> +			break;
> >>   
> >
> > Need to handle EINTR.
> >
> >> +		size = qemu_send_packet_async(s->vc, s->buf, size,
> >> +						raw_send_completed);
> >> +		if (size == 0)
> >> +			qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
> >> +
> >> +	} while (size > 0);
> >> +}
> >> +
> >> +static ssize_t raw_receive_iov(VLANClientState *vc, const struct iovec *iov,
> >> +				int iovcnt)
> >> +{
> >> +	ssize_t len;
> >> +	RAWState *s = vc->opaque;
> >> +
> >> +	do {
> >> +		len = writev(s->fd, iov, iovcnt);
> >> +	} while (len == -1 && (errno == EINTR || errno == EAGAIN));
> >>   
> >
> > Spinning on EAGAIN is certainly wrong.
> >
> >> +static int net_raw_init(Monitor *mon, VLANState *vlan, const char *model,
> >> +			const char *name, const char *ifname,
> >> +			int promisc, int fd)
> >> +{
> >> +	RAWState *s;
> >> +
> >> +	s = qemu_mallocz(sizeof(RAWState));
> >> +
> >> +	if (fd == -1) {
> >> +		s->fd = net_raw_fd_init(mon, ifname, promisc);
> >> +		s->promisc = promisc;
> >> +	} else
> >> +		s->fd = fd;
> >> +
> >> +	fcntl(s->fd, F_SETFL, O_NONBLOCK);
> >>   
> >
> > For net_raw_fd_int, you've already set O_NONBLOCK but you're also  
> > removing any other flags that have been set which is probably wrong for  
> > a passed in fd.
> >
> >> +	s->vc = qemu_new_vlan_client(vlan, model, name, NULL, raw_receive,
> >> diff --git a/qemu-options.hx b/qemu-options.hx
> >> index bde3e3f..0d5440f 100644
> >> --- a/qemu-options.hx
> >> +++ b/qemu-options.hx
> >> @@ -825,6 +825,10 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
> >>      "                default of 'sndbuf=1048576' can be disabled using 'sndbuf=0'\n"
> >>  #endif
> >>  #endif
> >> +    "-net raw[,vlan=n][,name=str],ifname=name[,promisc=m]\n"
> >> +    "                bound the host network interface to VLAN 'n' in a raw manner:\n"
> >> +    "                packets received on the interface are delivered to the vlan and\n"
> >> +    "                packets delivered on the vlan are sent to the interface\n"
> >>      "-net socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n"
> >>      "                connect the vlan 'n' to another VLAN using a socket connection\n"
> >>      "-net socket[,vlan=n][,name=str][,fd=h][,mcast=maddr:port]\n"
> >>   
> >
> > Needs documentation.
> >
> > Regards,
> >
> > Anthony Liguori
> 

--
			Gleb.

next prev parent reply	other threads:[~2009-10-14 15:33 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-14 14:34 [Qemu-devel] [PATCH-updated] qemu/net: add raw backend Or Gerlitz
2009-10-14 14:46 ` [Qemu-devel] " Anthony Liguori
2009-10-14 15:14   ` Jamie Lokier
2009-10-14 15:58     ` Anthony Liguori
2009-10-14 16:14       ` Michael S. Tsirkin
2009-10-14 16:54         ` Jamie Lokier
2009-10-14 17:20           ` Michael S. Tsirkin
2009-10-14 18:36         ` Anthony Liguori
2009-10-14 19:37           ` Michael S. Tsirkin
2009-10-15  7:48           ` Or Gerlitz
2009-10-14 15:58     ` Michael S. Tsirkin
2009-10-14 15:24   ` Michael S. Tsirkin
2009-10-14 15:33     ` Gleb Natapov [this message]
2009-10-15  7:29       ` Or Gerlitz
2009-10-15  7:44         ` Gleb Natapov
2009-10-15  7:50           ` Or Gerlitz
2009-10-14 16:20   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091014153300.GA30889@redhat.com \
    --to=gleb@redhat.com \
    --cc=arndbergmann@googlemail.com \
    --cc=mst@redhat.com \
    --cc=ogerlitz@voltaire.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).