From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1My5kz-0006oU-1n for qemu-devel@nongnu.org; Wed, 14 Oct 2009 11:27:17 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1My5kt-0006h6-BL for qemu-devel@nongnu.org; Wed, 14 Oct 2009 11:27:15 -0400 Received: from [199.232.76.173] (port=43427 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1My5kt-0006gj-5x for qemu-devel@nongnu.org; Wed, 14 Oct 2009 11:27:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58370) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1My5ks-0004DT-Hk for qemu-devel@nongnu.org; Wed, 14 Oct 2009 11:27:11 -0400 Date: Wed, 14 Oct 2009 17:24:52 +0200 From: "Michael S. Tsirkin" Message-ID: <20091014152451.GA30179@redhat.com> References: <20091014143415.GA29937@redhat.com> <4AD5E449.9070301@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4AD5E449.9070301@codemonkey.ws> Subject: [Qemu-devel] Re: [PATCH-updated] qemu/net: add raw backend List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Or Gerlitz , Arnd Bergmann , qemu-devel@nongnu.org On Wed, Oct 14, 2009 at 09:46:33AM -0500, Anthony Liguori wrote: > Or Gerlitz wrote: >> Add raw network backend option which uses a packet socket to provide >> raw networking access. Once the socket is opened it's bound to a >> provided host interface, such that packets received on the interface >> are delivered to the VM and packets sent by the VM are sent to the >> interface. >> >> This is functionally similar to the existing pcap network >> backend, with the same advantages and problems. >> Differences from pcap: >> - can get an open socket from the monitor, >> which allows running without NET_ADMIN priviledges >> - support iovec sends with writev, saving one data copy >> - one less dependency on an external library >> - we have access to the underlying file descriptor >> which makes it possible to connect to vhost net >> - don't support polling all interfaces, always bind to a specific one >> > > Networking is probably the area in qemu that users most frequently > stumble with. The most common problems are: > > 1) slirp does not behave how they think it should (icmp doesn't work, > guest isn't accessable from host) > 2) it's difficult to figure out which backend behaves the way they want > (socket vs. vde vs. tap) > 3) when they figure out they need tap, tap is difficult to setup tap does not need any setup. problem is, bridge needs setup. > The problem with introducing a raw backend (or a pcap backend) > is that > it makes #2 even worse because now a user has to figure out whether they > need raw/pcap or tap. But most troubling, it introduces another issue: > > 4) raw does not behave how they think it should because guest<->host > networking does not work bidirectionally OTOH icmp works fine, so we are not worse off than with slirp. > So unless there's an extremely compelling reason to have this, I work with remote machines all the time, having to fiddle with host bridge/network setup means I always risk losing the only way to admin the machine. So it's slirp or raw for me. If I'm the only one like this, I can keep maintaining this patch, but I doubt it. I consider this a compelling reason. > I'd rather not introduce this complexity. Does another option really add that much complexity? We add options all the time ... > NB, I see this as a problem with > vhost_net too if #4 is also true in that context. It's up to user whether to connect vhost net to tap or socket. I haven't posted userspace code to connect vhost to tap yet but I will RSN. >> Signed-off-by: Or Gerlitz >> Signed-off-by: Michael S. Tsirkin >> --- >> hw/virtio-net.c | 3 +- >> net.c | 192 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> qemu-options.hx | 4 + >> 3 files changed, 198 insertions(+), 1 deletions(-) >> >> diff --git a/hw/virtio-net.c b/hw/virtio-net.c >> index 1ac05a2..95d9f93 100644 >> --- a/hw/virtio-net.c >> +++ b/hw/virtio-net.c >> @@ -545,7 +545,8 @@ static ssize_t virtio_net_receive2(VLANClientState *vc, const uint8_t *buf, size >> virtqueue_pop(n->rx_vq, &elem) == 0) { >> if (i == 0) >> return -1; >> - fprintf(stderr, "virtio-net truncating packet\n"); >> + fprintf(stderr, "virtio-net truncating packet. offset %zd size %zd\n", >> + offset, size); >> exit(1); >> } >> > > This doesn't belong here. >> diff --git a/net.c b/net.c >> index d93eaef..1e0e874 100644 >> --- a/net.c >> +++ b/net.c >> @@ -93,6 +93,9 @@ >> #endif >> #endif >> +#include >> +#include >> + >> > > This is certainly missing guards. > >> #if defined(__OpenBSD__) >> #include >> #endif >> @@ -1860,6 +1863,158 @@ static TAPState *net_tap_init(VLANState *vlan, const char *model, >> #endif /* !_WIN32 */ >> +typedef struct RAWState { >> + VLANClientState *vc; >> + int fd; >> + uint8_t buf[4096]; >> + int promisc; >> +} RAWState; >> + >> +static int net_raw_fd_init(Monitor *mon, const char *ifname, int promisc) >> +{ >> + int fd, ret; >> + struct ifreq req; >> + struct sockaddr_ll lladdr; >> + >> + fd = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); >> + if (fd < 0) >> + fprintf(stderr, "packet socket failed\n"); >> > > CodingStyle > > Also, this error checking should use the monitor error reporting > framework. And falling through with fd=-1 certainly means we'll SEGV or > worse further down. > >> + memset(&req, 0, sizeof(req)); >> + strncpy(req.ifr_name, ifname, IFNAMSIZ-1); >> > > Would be better to just use snprintf > >> + ret = ioctl(fd, SIOCGIFINDEX, &req); >> + if (ret < 0) >> + fprintf(stderr, "SIOCGIFINDEX failed\n"); >> + >> + memset(&lladdr, 0, sizeof(lladdr)); >> + lladdr.sll_family = AF_PACKET; >> + lladdr.sll_protocol = htons(ETH_P_ALL); >> + lladdr.sll_ifindex = req.ifr_ifindex; >> + ret = bind(fd, (const struct sockaddr *)&lladdr, sizeof(lladdr)); >> + if (ret < 0) >> + fprintf(stderr, "bind failed\n"); >> >> > > Error checking is bad here. > >> + /* set iface to promiscuous mode (packets sent to the VM MAC) */ >> + if (promisc) { >> + ret = ioctl(fd, SIOCGIFFLAGS, &req); >> + if (ret < 0) >> + perror("SIOCGIFFLAGS failed\n"); >> + req.ifr_flags |= IFF_PROMISC; >> + ret = ioctl(fd, SIOCSIFFLAGS, &req); >> + if (ret < 0) >> + fprintf(stderr, "SIOCSIFFLAGS to promiscous failed\n"); >> + } >> > > I suspect these ioctls are Linux specific. > >> + ret = fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | O_NONBLOCK); >> + if (ret < 0) >> + fprintf(stderr, "O_NONBLOCK set failed\n"); >> + >> + return fd; >> +} >> + >> +static void raw_cleanup(VLANClientState *vc) >> +{ >> + struct ifreq req; >> + RAWState *s = vc->opaque; >> + >> + qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL); >> + if (s->promisc) { >> + ioctl(s->fd, SIOCGIFFLAGS, &req); >> + req.ifr_flags &= ~IFF_PROMISC; >> + ioctl(s->fd, SIOCSIFFLAGS, &req); >> + } >> + close(s->fd); >> + qemu_free(s); >> +} >> + >> +static void raw_send(void *opaque); >> + >> +static int raw_can_send(void *opaque) >> +{ >> + RAWState *s = opaque; >> + >> + return qemu_can_send_packet(s->vc); >> +} >> + >> +static void raw_send_completed(VLANClientState *vc, ssize_t len) >> +{ >> + RAWState *s = vc->opaque; >> + >> + qemu_set_fd_handler2(s->fd, raw_can_send, raw_send, NULL, s); >> +} >> + >> +static void raw_send(void *opaque) >> +{ >> + RAWState *s = opaque; >> + int size; >> + >> + do { >> + size = recv(s->fd, s->buf, sizeof(s->buf), MSG_TRUNC); >> + if (size <= 0) >> + break; >> > > Need to handle EINTR. > >> + size = qemu_send_packet_async(s->vc, s->buf, size, >> + raw_send_completed); >> + if (size == 0) >> + qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL); >> + >> + } while (size > 0); >> +} >> + >> +static ssize_t raw_receive_iov(VLANClientState *vc, const struct iovec *iov, >> + int iovcnt) >> +{ >> + ssize_t len; >> + RAWState *s = vc->opaque; >> + >> + do { >> + len = writev(s->fd, iov, iovcnt); >> + } while (len == -1 && (errno == EINTR || errno == EAGAIN)); >> > > Spinning on EAGAIN is certainly wrong. > >> +static int net_raw_init(Monitor *mon, VLANState *vlan, const char *model, >> + const char *name, const char *ifname, >> + int promisc, int fd) >> +{ >> + RAWState *s; >> + >> + s = qemu_mallocz(sizeof(RAWState)); >> + >> + if (fd == -1) { >> + s->fd = net_raw_fd_init(mon, ifname, promisc); >> + s->promisc = promisc; >> + } else >> + s->fd = fd; >> + >> + fcntl(s->fd, F_SETFL, O_NONBLOCK); >> > > For net_raw_fd_int, you've already set O_NONBLOCK but you're also > removing any other flags that have been set which is probably wrong for > a passed in fd. > >> + s->vc = qemu_new_vlan_client(vlan, model, name, NULL, raw_receive, >> diff --git a/qemu-options.hx b/qemu-options.hx >> index bde3e3f..0d5440f 100644 >> --- a/qemu-options.hx >> +++ b/qemu-options.hx >> @@ -825,6 +825,10 @@ DEF("net", HAS_ARG, QEMU_OPTION_net, >> " default of 'sndbuf=1048576' can be disabled using 'sndbuf=0'\n" >> #endif >> #endif >> + "-net raw[,vlan=n][,name=str],ifname=name[,promisc=m]\n" >> + " bound the host network interface to VLAN 'n' in a raw manner:\n" >> + " packets received on the interface are delivered to the vlan and\n" >> + " packets delivered on the vlan are sent to the interface\n" >> "-net socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n" >> " connect the vlan 'n' to another VLAN using a socket connection\n" >> "-net socket[,vlan=n][,name=str][,fd=h][,mcast=maddr:port]\n" >> > > Needs documentation. > > Regards, > > Anthony Liguori