* [PATCH 0/12] virtio_net perf patches
@ 2008-08-11 20:12 Mark McLoughlin
2008-08-11 20:12 ` [PATCH 01/12] kvm: qemu: Fix virtio_net tx timer Mark McLoughlin
` (2 more replies)
0 siblings, 3 replies; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm
Hi Avi,
Here's the set of patches that I think make sense to apply.
I've left out Anthony's zero-copy patch since I wasn't seeing
a measurable difference with it, it's quite invasive and isn't safe
with the "drop the global mutex during tapfd read()" patch.
I left that mutex patch 'til last since you may still be
nervous about concurrency issues and also the posix-timers kernel race
that it triggers.
Another change since the first round of patches is that it
took a bit of ugly hackery to handle the case where IFF_VNET_HDR
is supported and we're using e.g. e1000. Since it's difficult to
only enable IFF_VNET_HDR when we're using virtio_net_hdr, I went with
this approach.
Lastly, I need to add a "vnet_hdr=on" param to "-net tap" so
that we can know if the supplied tap fd has IFF_VNET_HDR enabled.
There's no interface to query that on the fd right now, and even if
I did cook up a patch it probably wouldn't make 2.6.27.
Cheers,
Mark.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 01/12] kvm: qemu: Fix virtio_net tx timer
2008-08-11 20:12 [PATCH 0/12] virtio_net perf patches Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
2008-08-11 20:12 ` [PATCH 02/12] kvm: qemu: Remove virtio_net tx ring-full heuristic Mark McLoughlin
2008-08-11 20:30 ` [PATCH 0/12] virtio_net perf patches Anthony Liguori
2008-08-12 13:51 ` Avi Kivity
2 siblings, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
The current virtio_net tx timer is 2ns, which doesn't make
any sense. Set it to a more reasonable 250us instead.
However, even though we were requesting a 2ns tx timer, it
was actually getting limited to MIN_TIMER_REARM_US which is
currently 250us.
So, even though the timer itself would only fire after
250us, expire_time was only set to +2ns, so we'd get the
timeout callback next time qemu_run_timers() was called from
the mainloop.
This probably accounted for a lot of the jitter in the
throughput numbers - the effective tx timer length was
anywhere between 2ns and 250us depending on e.g. whether
there was rx data available on the tap fd.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/hw/virtio-net.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 2e57e5a..3a39c8f 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -26,7 +26,7 @@
#define VIRTIO_NET_F_MAC 5
#define VIRTIO_NET_F_GS0 6
-#define TX_TIMER_INTERVAL (1000 / 500)
+#define TX_TIMER_INTERVAL 250000 /* 250 us */
/* The config defining mac address (6 bytes) */
struct virtio_net_config
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 02/12] kvm: qemu: Remove virtio_net tx ring-full heuristic
2008-08-11 20:12 ` [PATCH 01/12] kvm: qemu: Fix virtio_net tx timer Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
2008-08-11 20:12 ` [PATCH 03/12] kvm: qemu: Add VIRTIO_F_NOTIFY_ON_EMPTY Mark McLoughlin
0 siblings, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
virtio_net tries to guess when it has received a tx
notification from the guest whether it indicates that the
guest has no more room in the tx ring and it should
immediately flush the queued buffers.
The heuristic is based on the fact that there are 128
buffer entries in the ring and each packet uses 2 buffers
(i.e. the virtio_net_hdr and the packet's linear data).
Using GSO or increasing the size of the rings will break
that heuristic, so let's remove it and assume that any
notification from the guest after we've disabled
notifications indicates that we should flush our buffers.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/hw/virtio-net.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 3a39c8f..b001475 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -175,8 +175,7 @@ static void virtio_net_handle_tx(VirtIODevice *vdev, VirtQueue *vq)
{
VirtIONet *n = to_virtio_net(vdev);
- if (n->tx_timer_active &&
- (vq->vring.avail->idx - vq->last_avail_idx) == 64) {
+ if (n->tx_timer_active) {
vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
qemu_del_timer(n->tx_timer);
n->tx_timer_active = 0;
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 03/12] kvm: qemu: Add VIRTIO_F_NOTIFY_ON_EMPTY
2008-08-11 20:12 ` [PATCH 02/12] kvm: qemu: Remove virtio_net tx ring-full heuristic Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
2008-08-11 20:12 ` [PATCH 04/12] kvm: qemu: Disable recv notifications until avail buffers exhausted Mark McLoughlin
0 siblings, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
Set the VIRTIO_F_NOTIFY_ON_EMPTY feature bit so the
guest can rely on us notifying them when the queue
is empty.
Also, only notify when the available queue is empty
*and* when we've finished with all the buffers we
had detached. Right now, when the queue is empty,
we notify the guest for every used buffer.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/hw/virtio.c | 6 +++++-
qemu/hw/virtio.h | 5 +++++
2 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c
index 3429ac8..e035e4e 100644
--- a/qemu/hw/virtio.c
+++ b/qemu/hw/virtio.c
@@ -138,6 +138,7 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
/* Make sure buffer is written before we update index. */
wmb();
vq->vring.used->idx++;
+ vq->inuse--;
}
int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
@@ -187,6 +188,8 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
elem->index = head;
+ vq->inuse++;
+
return elem->in_num + elem->out_num;
}
@@ -275,6 +278,7 @@ static uint32_t virtio_ioport_read(void *opaque, uint32_t addr)
switch (addr) {
case VIRTIO_PCI_HOST_FEATURES:
ret = vdev->get_features(vdev);
+ ret |= (1 << VIRTIO_F_NOTIFY_ON_EMPTY);
break;
case VIRTIO_PCI_GUEST_FEATURES:
ret = vdev->features;
@@ -431,7 +435,7 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
{
/* Always notify when queue is empty */
- if (vq->vring.avail->idx != vq->last_avail_idx &&
+ if ((vq->inuse || vq->vring.avail->idx != vq->last_avail_idx) &&
(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
return;
diff --git a/qemu/hw/virtio.h b/qemu/hw/virtio.h
index 61f5038..1adaed3 100644
--- a/qemu/hw/virtio.h
+++ b/qemu/hw/virtio.h
@@ -30,6 +30,10 @@
/* We've given up on this device. */
#define VIRTIO_CONFIG_S_FAILED 0x80
+/* We notify when the ring is completely used, even if the guest is supressing
+ * callbacks */
+#define VIRTIO_F_NOTIFY_ON_EMPTY 24
+
/* from Linux's linux/virtio_ring.h */
/* This marks a buffer as continuing via the next field. */
@@ -86,6 +90,7 @@ struct VirtQueue
VRing vring;
uint32_t pfn;
uint16_t last_avail_idx;
+ int inuse;
void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
};
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 04/12] kvm: qemu: Disable recv notifications until avail buffers exhausted
2008-08-11 20:12 ` [PATCH 03/12] kvm: qemu: Add VIRTIO_F_NOTIFY_ON_EMPTY Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
2008-08-11 20:12 ` [PATCH 05/12] kvm: qemu: Add support for partial csums and GSO Mark McLoughlin
0 siblings, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
Once we know we have buffers available on the receive ring, we can
safely disable notifications.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/hw/virtio-net.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index b001475..47349ce 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -106,9 +106,12 @@ static int virtio_net_can_receive(void *opaque)
!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK))
return 0;
- if (n->rx_vq->vring.avail->idx == n->rx_vq->last_avail_idx)
+ if (n->rx_vq->vring.avail->idx == n->rx_vq->last_avail_idx) {
+ n->rx_vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
return 0;
+ }
+ n->rx_vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
return 1;
}
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 05/12] kvm: qemu: Add support for partial csums and GSO
2008-08-11 20:12 ` [PATCH 04/12] kvm: qemu: Disable recv notifications until avail buffers exhausted Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
2008-08-11 20:12 ` [PATCH 06/12] kvm: qemu: Rename tap_readv() to tap_receive_iov() Mark McLoughlin
0 siblings, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
The tun/tap driver in 2.6.27 contains a new IFF_VNET_HDR
flag which makes every packet read from or written to the
tap fd be preceded by a virtio_net_hdr header.
This allows us to pass larger packets and packets with
partial checkums between the guest and the host, greatly
increasing the achievable bandwidth.
If the tap device has IFF_VNET_HDR enabled, the virtio-net
driver the merely needs to shuffle the headers supplied
by the guest or host to the other side.
We also inform the guest that we can now receive GSO packets
and have it confirm whether it can do likewise. If the guest
can receive GSO packets, we enable GSO on the tun device
using TUNSETOFFLOAD.
Note also that we increase the size of the tap packet buffer
to accomodate the largest possible GSO packet.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/hw/virtio-net.c | 86 +++++++++++++++++++++++++++++++++++++++++---------
qemu/net.h | 5 +++
qemu/vl.c | 75 ++++++++++++++++++++++++++++++++++++++-----
3 files changed, 142 insertions(+), 24 deletions(-)
diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 47349ce..9b1298e 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -22,9 +22,18 @@
#define VIRTIO_ID_NET 1
/* The feature bitmap for virtio net */
-#define VIRTIO_NET_F_NO_CSUM 0
-#define VIRTIO_NET_F_MAC 5
-#define VIRTIO_NET_F_GS0 6
+#define VIRTIO_NET_F_CSUM 0 /* Host handles pkts w/ partial csum */
+#define VIRTIO_NET_F_GUEST_CSUM 1 /* Guest handles pkts w/ partial csum */
+#define VIRTIO_NET_F_MAC 5 /* Host has given MAC address. */
+#define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */
+#define VIRTIO_NET_F_GUEST_TSO4 7 /* Guest can handle TSOv4 in. */
+#define VIRTIO_NET_F_GUEST_TSO6 8 /* Guest can handle TSOv6 in. */
+#define VIRTIO_NET_F_GUEST_ECN 9 /* Guest can handle TSO[6] w/ ECN in. */
+#define VIRTIO_NET_F_GUEST_UFO 10 /* Guest can handle UFO in. */
+#define VIRTIO_NET_F_HOST_TSO4 11 /* Host can handle TSOv4 in. */
+#define VIRTIO_NET_F_HOST_TSO6 12 /* Host can handle TSOv6 in. */
+#define VIRTIO_NET_F_HOST_ECN 13 /* Host can handle TSO[6] w/ ECN in. */
+#define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */
#define TX_TIMER_INTERVAL 250000 /* 250 us */
@@ -42,8 +51,6 @@ struct virtio_net_hdr
uint8_t flags;
#define VIRTIO_NET_HDR_GSO_NONE 0 // Not a GSO frame
#define VIRTIO_NET_HDR_GSO_TCPV4 1 // GSO frame, IPv4 TCP (TSO)
-/* FIXME: Do we need this? If they said they can handle ECN, do they care? */
-#define VIRTIO_NET_HDR_GSO_TCPV4_ECN 2 // GSO frame, IPv4 TCP w/ ECN
#define VIRTIO_NET_HDR_GSO_UDP 3 // GSO frame, IPv4 UDP (UFO)
#define VIRTIO_NET_HDR_GSO_TCPV6 4 // GSO frame, IPv6 TCP
#define VIRTIO_NET_HDR_GSO_ECN 0x80 // TCP has ECN set
@@ -85,7 +92,38 @@ static void virtio_net_update_config(VirtIODevice *vdev, uint8_t *config)
static uint32_t virtio_net_get_features(VirtIODevice *vdev)
{
- return (1 << VIRTIO_NET_F_MAC);
+ VirtIONet *n = to_virtio_net(vdev);
+ VLANClientState *host = n->vc->vlan->first_client;
+ uint32_t features = (1 << VIRTIO_NET_F_MAC);
+
+ if (tap_has_vnet_hdr(host)) {
+ features |= (1 << VIRTIO_NET_F_CSUM);
+ features |= (1 << VIRTIO_NET_F_GUEST_CSUM);
+ features |= (1 << VIRTIO_NET_F_GUEST_TSO4);
+ features |= (1 << VIRTIO_NET_F_GUEST_TSO6);
+ features |= (1 << VIRTIO_NET_F_GUEST_ECN);
+ features |= (1 << VIRTIO_NET_F_HOST_TSO4);
+ features |= (1 << VIRTIO_NET_F_HOST_TSO6);
+ features |= (1 << VIRTIO_NET_F_HOST_ECN);
+ /* Kernel can't actually handle UFO in software currently. */
+ }
+
+ return features;
+}
+
+static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
+{
+ VirtIONet *n = to_virtio_net(vdev);
+ VLANClientState *host = n->vc->vlan->first_client;
+
+ if (!tap_has_vnet_hdr(host) || !host->set_offload)
+ return;
+
+ host->set_offload(host,
+ (features >> VIRTIO_NET_F_GUEST_CSUM) & 1,
+ (features >> VIRTIO_NET_F_GUEST_TSO4) & 1,
+ (features >> VIRTIO_NET_F_GUEST_TSO6) & 1,
+ (features >> VIRTIO_NET_F_GUEST_ECN) & 1);
}
/* RX */
@@ -121,6 +159,7 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size)
VirtQueueElement elem;
struct virtio_net_hdr *hdr;
int offset, i;
+ int total;
if (virtqueue_pop(n->rx_vq, &elem) == 0)
return;
@@ -134,18 +173,26 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size)
hdr->flags = 0;
hdr->gso_type = VIRTIO_NET_HDR_GSO_NONE;
- /* copy in packet. ugh */
offset = 0;
+ total = sizeof(*hdr);
+
+ if (tap_has_vnet_hdr(n->vc->vlan->first_client)) {
+ memcpy(hdr, buf, sizeof(*hdr));
+ offset += total;
+ }
+
+ /* copy in packet. ugh */
i = 1;
while (offset < size && i < elem.in_num) {
int len = MIN(elem.in_sg[i].iov_len, size - offset);
memcpy(elem.in_sg[i].iov_base, buf + offset, len);
offset += len;
+ total += len;
i++;
}
/* signal other side */
- virtqueue_push(n->rx_vq, &elem, sizeof(*hdr) + offset);
+ virtqueue_push(n->rx_vq, &elem, total);
virtio_notify(&n->vdev, n->rx_vq);
}
@@ -153,23 +200,31 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size)
static void virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
{
VirtQueueElement elem;
+ int has_vnet_hdr = tap_has_vnet_hdr(n->vc->vlan->first_client);
if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK))
return;
while (virtqueue_pop(vq, &elem)) {
ssize_t len = 0;
+ unsigned int out_num = elem.out_num;
+ struct iovec *out_sg = &elem.out_sg[0];
+
+ if (out_num < 1 || out_sg->iov_len != sizeof(struct virtio_net_hdr)) {
+ fprintf(stderr, "virtio-net header not in first element\n");
+ exit(1);
+ }
- if (elem.out_num < 1 ||
- elem.out_sg[0].iov_len != sizeof(struct virtio_net_hdr)) {
- fprintf(stderr, "virtio-net header not in first element\n");
- exit(1);
+ /* ignore the header if GSO is not supported */
+ if (!has_vnet_hdr) {
+ out_num--;
+ out_sg++;
+ len += sizeof(struct virtio_net_hdr);
}
- /* ignore the header for now */
- len = qemu_sendv_packet(n->vc, &elem.out_sg[1], elem.out_num - 1);
+ len += qemu_sendv_packet(n->vc, out_sg, out_num);
- virtqueue_push(vq, &elem, sizeof(struct virtio_net_hdr) + len);
+ virtqueue_push(vq, &elem, len);
virtio_notify(&n->vdev, vq);
}
}
@@ -249,6 +304,7 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn)
n->vdev.update_config = virtio_net_update_config;
n->vdev.get_features = virtio_net_get_features;
+ n->vdev.set_features = virtio_net_set_features;
n->rx_vq = virtio_add_queue(&n->vdev, 128, virtio_net_handle_rx);
n->tx_vq = virtio_add_queue(&n->vdev, 128, virtio_net_handle_tx);
memcpy(n->mac, nd->macaddr, 6);
diff --git a/qemu/net.h b/qemu/net.h
index afc29ef..ae1a338 100644
--- a/qemu/net.h
+++ b/qemu/net.h
@@ -9,12 +9,15 @@ typedef ssize_t (IOReadvHandler)(void *, const struct iovec *, int);
typedef struct VLANClientState VLANClientState;
+typedef void (SetOffload)(VLANClientState *, int, int, int, int);
+
struct VLANClientState {
IOReadHandler *fd_read;
IOReadvHandler *fd_readv;
/* Packets may still be sent if this returns zero. It's used to
rate-limit the slirp code. */
IOCanRWHandler *fd_can_read;
+ SetOffload *set_offload;
void *opaque;
struct VLANClientState *next;
struct VLANState *vlan;
@@ -42,6 +45,8 @@ void qemu_handler_true(void *opaque);
void do_info_network(void);
+int tap_has_vnet_hdr(void *opaque);
+
int net_client_init(const char *device, const char *opts);
void net_client_uninit(NICInfo *nd);
diff --git a/qemu/vl.c b/qemu/vl.c
index b39e919..fc53ae0 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4340,14 +4340,33 @@ void do_info_slirp(void)
#endif /* CONFIG_SLIRP */
-#if !defined(_WIN32)
+#ifdef _WIN32
+
+int tap_has_vnet_hdr(void *opaque)
+{
+ return 0;
+}
+
+#else /* !defined(_WIN32) */
+
+#ifndef IFF_VNET_HDR
+#define TAP_BUFSIZE 4096
+#else
+#include <linux/virtio_net.h>
+#define ETH_HLEN 14
+#define ETH_DATA_LEN 1500
+#define MAX_PACKET_LEN (ETH_HLEN + ETH_DATA_LEN)
+#define MAX_SKB_FRAGS ((65536/TARGET_PAGE_SIZE) + 2)
+#define TAP_BUFSIZE (sizeof(struct virtio_net_hdr) + MAX_PACKET_LEN + (MAX_SKB_FRAGS*TARGET_PAGE_SIZE))
+#endif
typedef struct TAPState {
VLANClientState *vc;
int fd;
char down_script[1024];
- char buf[4096];
+ char buf[TAP_BUFSIZE];
int size;
+ unsigned int has_vnet_hdr : 1;
} TAPState;
static void tap_receive(void *opaque, const uint8_t *buf, int size)
@@ -4442,9 +4461,40 @@ static void tap_send(void *opaque)
} while (s->size > 0);
}
+int tap_has_vnet_hdr(void *opaque)
+{
+ VLANClientState *vc = opaque;
+ TAPState *s = vc->opaque;
+
+ return s ? s->has_vnet_hdr : 0;
+}
+
+#ifdef TUNSETOFFLOAD
+static void tap_set_offload(VLANClientState *vc, int csum, int tso4, int tso6,
+ int ecn)
+{
+ TAPState *s = vc->opaque;
+ unsigned int offload = 0;
+
+ if (csum) {
+ offload |= TUN_F_CSUM;
+ if (tso4)
+ offload |= TUN_F_TSO4;
+ if (tso6)
+ offload |= TUN_F_TSO6;
+ if ((tso4 || tso6) && ecn)
+ offload |= TUN_F_TSO_ECN;
+ }
+
+ if (ioctl(s->fd, TUNSETOFFLOAD, offload) != 0)
+ fprintf(stderr, "TUNSETOFFLOAD ioctl() failed: %s\n",
+ strerror(errno));
+}
+#endif /* TUNSETOFFLOAD */
+
/* fd support */
-static TAPState *net_tap_fd_init(VLANState *vlan, int fd)
+static TAPState *net_tap_fd_init(VLANState *vlan, int fd, int vnet_hdr)
{
TAPState *s;
@@ -4452,15 +4502,19 @@ static TAPState *net_tap_fd_init(VLANState *vlan, int fd)
if (!s)
return NULL;
s->fd = fd;
+ s->has_vnet_hdr = vnet_hdr != 0;
s->vc = qemu_new_vlan_client(vlan, tap_receive, NULL, s);
s->vc->fd_readv = tap_readv;
+#ifdef TUNSETOFFLOAD
+ s->vc->set_offload = tap_set_offload;
+#endif
qemu_set_fd_handler2(s->fd, tap_can_send, tap_send, NULL, s);
snprintf(s->vc->info_str, sizeof(s->vc->info_str), "tap: fd=%d", fd);
return s;
}
#if defined (_BSD) || defined (__FreeBSD_kernel__)
-static int tap_open(char *ifname, int ifname_size)
+static int tap_open(char *ifname, int ifname_size, int *vnet_hdr)
{
int fd;
char *dev;
@@ -4602,7 +4656,7 @@ int tap_alloc(char *dev)
return tap_fd;
}
-static int tap_open(char *ifname, int ifname_size)
+static int tap_open(char *ifname, int ifname_size, int *vnet_hdr)
{
char dev[10]="";
int fd;
@@ -4615,7 +4669,7 @@ static int tap_open(char *ifname, int ifname_size)
return fd;
}
#else
-static int tap_open(char *ifname, int ifname_size)
+static int tap_open(char *ifname, int ifname_size, int *vnet_hdr)
{
struct ifreq ifr;
int fd, ret;
@@ -4684,13 +4738,15 @@ static int net_tap_init(VLANState *vlan, const char *ifname1,
{
TAPState *s;
int fd;
+ int vnet_hdr;
char ifname[128];
if (ifname1 != NULL)
pstrcpy(ifname, sizeof(ifname), ifname1);
else
ifname[0] = '\0';
- TFR(fd = tap_open(ifname, sizeof(ifname)));
+ vnet_hdr = 0;
+ TFR(fd = tap_open(ifname, sizeof(ifname), &vnet_hdr));
if (fd < 0)
return -1;
@@ -4700,9 +4756,10 @@ static int net_tap_init(VLANState *vlan, const char *ifname1,
if (launch_script(setup_script, ifname, fd))
return -1;
}
- s = net_tap_fd_init(vlan, fd);
+ s = net_tap_fd_init(vlan, fd, vnet_hdr);
if (!s)
return -1;
+
snprintf(s->vc->info_str, sizeof(s->vc->info_str),
"tap: ifname=%s setup_script=%s", ifname, setup_script);
if (down_script && strcmp(down_script, "no"))
@@ -5364,7 +5421,7 @@ int net_client_init(const char *device, const char *p)
fd = strtol(buf, NULL, 0);
fcntl(fd, F_SETFL, O_NONBLOCK);
ret = -1;
- if (net_tap_fd_init(vlan, fd))
+ if (net_tap_fd_init(vlan, fd, 0))
ret = 0;
} else {
if (get_param_value(ifname, sizeof(ifname), "ifname", p) <= 0) {
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 06/12] kvm: qemu: Rename tap_readv() to tap_receive_iov()
2008-08-11 20:12 ` [PATCH 05/12] kvm: qemu: Add support for partial csums and GSO Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
2008-08-11 20:12 ` [PATCH 07/12] kvm: qemu: Move some code around for the next commit Mark McLoughlin
0 siblings, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
Rename tap_readv() so as to match tap_receive() naming and
also to allow a tap_writev() helper function to not seem so
weird.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/vl.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/qemu/vl.c b/qemu/vl.c
index fc53ae0..126944d 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4382,8 +4382,8 @@ static void tap_receive(void *opaque, const uint8_t *buf, int size)
}
}
-static ssize_t tap_readv(void *opaque, const struct iovec *iov,
- int iovcnt)
+static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov,
+ int iovcnt)
{
TAPState *s = opaque;
ssize_t len;
@@ -4504,7 +4504,7 @@ static TAPState *net_tap_fd_init(VLANState *vlan, int fd, int vnet_hdr)
s->fd = fd;
s->has_vnet_hdr = vnet_hdr != 0;
s->vc = qemu_new_vlan_client(vlan, tap_receive, NULL, s);
- s->vc->fd_readv = tap_readv;
+ s->vc->fd_readv = tap_receive_iov;
#ifdef TUNSETOFFLOAD
s->vc->set_offload = tap_set_offload;
#endif
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 07/12] kvm: qemu: Move some code around for the next commit
2008-08-11 20:12 ` [PATCH 06/12] kvm: qemu: Rename tap_readv() to tap_receive_iov() Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
2008-08-11 20:12 ` [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr Mark McLoughlin
0 siblings, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/vl.c | 26 +++++++++++++-------------
1 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/qemu/vl.c b/qemu/vl.c
index 126944d..f5aacf0 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4369,19 +4369,6 @@ typedef struct TAPState {
unsigned int has_vnet_hdr : 1;
} TAPState;
-static void tap_receive(void *opaque, const uint8_t *buf, int size)
-{
- TAPState *s = opaque;
- int ret;
- for(;;) {
- ret = write(s->fd, buf, size);
- if (ret < 0 && (errno == EINTR || errno == EAGAIN)) {
- } else {
- break;
- }
- }
-}
-
static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov,
int iovcnt)
{
@@ -4395,6 +4382,19 @@ static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov,
return len;
}
+static void tap_receive(void *opaque, const uint8_t *buf, int size)
+{
+ TAPState *s = opaque;
+ int ret;
+ for(;;) {
+ ret = write(s->fd, buf, size);
+ if (ret < 0 && (errno == EINTR || errno == EAGAIN)) {
+ } else {
+ break;
+ }
+ }
+}
+
static int tap_can_send(void *opaque)
{
TAPState *s = opaque;
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr
2008-08-11 20:12 ` [PATCH 07/12] kvm: qemu: Move some code around for the next commit Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
2008-08-11 20:12 ` [PATCH 09/12] kvm: qemu: Actually enable GSO support Mark McLoughlin
2008-08-12 17:41 ` [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr Mark McLoughlin
0 siblings, 2 replies; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
The virtio-net driver is the only one which wishes to deal
with virtio_net_hdr headers, so add a "using_vnet_hdr" flag
to allow it to indicate this.
Preferably, we'd prefer to only enable IFF_VNET_HDR when
we're using virtio-net, but qemu's various abstractions
would make this very messy.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/hw/virtio-net.c | 1 +
qemu/net.h | 1 +
qemu/vl.c | 78 +++++++++++++++++++++++++++++++++++++++++++-------
3 files changed, 69 insertions(+), 11 deletions(-)
diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 9b1298e..2298316 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -97,6 +97,7 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev)
uint32_t features = (1 << VIRTIO_NET_F_MAC);
if (tap_has_vnet_hdr(host)) {
+ tap_using_vnet_hdr(host, 1);
features |= (1 << VIRTIO_NET_F_CSUM);
features |= (1 << VIRTIO_NET_F_GUEST_CSUM);
features |= (1 << VIRTIO_NET_F_GUEST_TSO4);
diff --git a/qemu/net.h b/qemu/net.h
index ae1a338..4891669 100644
--- a/qemu/net.h
+++ b/qemu/net.h
@@ -46,6 +46,7 @@ void qemu_handler_true(void *opaque);
void do_info_network(void);
int tap_has_vnet_hdr(void *opaque);
+void tap_using_vnet_hdr(void *opaque, int using_vnet_hdr);
int net_client_init(const char *device, const char *opts);
void net_client_uninit(NICInfo *nd);
diff --git a/qemu/vl.c b/qemu/vl.c
index f5aacf0..9bfe480 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4347,6 +4347,10 @@ int tap_has_vnet_hdr(void *opaque)
return 0;
}
+void tap_using_vnet_hdr(void *opaque, int using_vnet_hdr)
+{
+}
+
#else /* !defined(_WIN32) */
#ifndef IFF_VNET_HDR
@@ -4367,10 +4371,11 @@ typedef struct TAPState {
char buf[TAP_BUFSIZE];
int size;
unsigned int has_vnet_hdr : 1;
+ unsigned int using_vnet_hdr : 1;
} TAPState;
-static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov,
- int iovcnt)
+static ssize_t tap_writev(void *opaque, const struct iovec *iov,
+ int iovcnt)
{
TAPState *s = opaque;
ssize_t len;
@@ -4382,17 +4387,44 @@ static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov,
return len;
}
+static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov,
+ int iovcnt)
+{
+ TAPState *s = opaque;
+ struct iovec *iov_copy;
+ struct virtio_net_hdr hdr = { 0, };
+
+ if (!s->has_vnet_hdr || s->using_vnet_hdr)
+ return tap_writev(opaque, iov, iovcnt);
+
+ iov_copy = alloca(sizeof(struct iovec) * (iovcnt + 1));
+
+ iov_copy[0].iov_base = &hdr;
+ iov_copy[0].iov_len = sizeof(hdr);
+
+ memcpy(&iov_copy[1], iov, sizeof(struct iovec) * iovcnt);
+
+ return tap_writev(opaque, iov_copy, iovcnt + 1);
+}
+
static void tap_receive(void *opaque, const uint8_t *buf, int size)
{
TAPState *s = opaque;
- int ret;
- for(;;) {
- ret = write(s->fd, buf, size);
- if (ret < 0 && (errno == EINTR || errno == EAGAIN)) {
- } else {
- break;
- }
+ struct virtio_net_hdr hdr = { 0, };
+ struct iovec iov[2];
+ int i = 0;
+
+ if (s->has_vnet_hdr && !s->using_vnet_hdr) {
+ iov[i].iov_base = &hdr;
+ iov[i].iov_len = sizeof(hdr);
+ i++;
}
+
+ iov[i].iov_base = (char *) buf;
+ iov[i].iov_len = size;
+ i++;
+
+ tap_writev(opaque, iov, i);
}
static int tap_can_send(void *opaque)
@@ -4421,6 +4453,19 @@ static int tap_can_send(void *opaque)
return can_receive;
}
+static int tap_send_packet(TAPState *s)
+{
+ uint8_t *buf = s->buf;
+ int size = s->size;
+
+ if (s->has_vnet_hdr && !s->using_vnet_hdr) {
+ buf += sizeof(struct virtio_net_hdr);
+ size -= sizeof(struct virtio_net_hdr);
+ }
+
+ return qemu_send_packet(s->vc, buf, size);
+}
+
static void tap_send(void *opaque)
{
TAPState *s = opaque;
@@ -4430,7 +4475,7 @@ static void tap_send(void *opaque)
int err;
/* If noone can receive the packet, buffer it */
- err = qemu_send_packet(s->vc, s->buf, s->size);
+ err = tap_send_packet(s);
if (err == -EAGAIN)
return;
}
@@ -4454,7 +4499,7 @@ static void tap_send(void *opaque)
int err;
/* If noone can receive the packet, buffer it */
- err = qemu_send_packet(s->vc, s->buf, s->size);
+ err = tap_send_packet(s);
if (err == -EAGAIN)
break;
}
@@ -4469,6 +4514,17 @@ int tap_has_vnet_hdr(void *opaque)
return s ? s->has_vnet_hdr : 0;
}
+void tap_using_vnet_hdr(void *opaque, int using_vnet_hdr)
+{
+ VLANClientState *vc = opaque;
+ TAPState *s = vc->opaque;
+
+ if (!s || !s->has_vnet_hdr)
+ return;
+
+ s->using_vnet_hdr = using_vnet_hdr != 0;
+}
+
#ifdef TUNSETOFFLOAD
static void tap_set_offload(VLANClientState *vc, int csum, int tso4, int tso6,
int ecn)
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 09/12] kvm: qemu: Actually enable GSO support
2008-08-11 20:12 ` [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
2008-08-11 20:12 ` [PATCH 10/12] kvm: qemu: Add a -net tap,fd=X,vnet_hdr=on option Mark McLoughlin
2008-08-12 17:41 ` [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr Mark McLoughlin
1 sibling, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
Now that GSO support doesn't break e.g. e1000, we can
enable it.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/vl.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/qemu/vl.c b/qemu/vl.c
index 9bfe480..14a66e0 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4737,6 +4737,19 @@ static int tap_open(char *ifname, int ifname_size, int *vnet_hdr)
}
memset(&ifr, 0, sizeof(ifr));
ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
+
+#if defined(TUNGETFEATURES) && defined(IFF_VNET_HDR)
+ {
+ unsigned int features;
+
+ if (ioctl(fd, TUNGETFEATURES, &features) == 0 &&
+ features & IFF_VNET_HDR) {
+ *vnet_hdr = 1;
+ ifr.ifr_flags |= IFF_VNET_HDR;
+ }
+ }
+#endif
+
if (ifname[0] != '\0')
pstrcpy(ifr.ifr_name, IFNAMSIZ, ifname);
else
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 10/12] kvm: qemu: Add a -net tap,fd=X,vnet_hdr=on option
2008-08-11 20:12 ` [PATCH 09/12] kvm: qemu: Actually enable GSO support Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
2008-08-11 20:12 ` [PATCH 11/12] kvm: qemu: Increase size of virtio_net rings Mark McLoughlin
2008-08-11 20:30 ` [PATCH 10/12] kvm: qemu: Add a -net tap,fd=X,vnet_hdr=on option Anthony Liguori
0 siblings, 2 replies; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
Some management apps like libvirt creates a tap interface,
adds it to a bridge and then passes the fd to qemu-kvm.
Since we have no way of querying whether the IFF_VNET_HDR
flag has been enabled on the interface, the app needs to
tell us if has enabled it.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/vl.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/qemu/vl.c b/qemu/vl.c
index 14a66e0..499d6f9 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -5487,10 +5487,19 @@ int net_client_init(const char *device, const char *p)
int fd;
vlan->nb_host_devs++;
if (get_param_value(buf, sizeof(buf), "fd", p) > 0) {
+ int vnet_hdr = 0;
fd = strtol(buf, NULL, 0);
fcntl(fd, F_SETFL, O_NONBLOCK);
+ if (get_param_value(buf, sizeof(buf), "vnet_hdr", p) > 0) {
+ if (strcmp(buf, "on") == 0) {
+ vnet_hdr = 1;
+ } else if (strcmp(buf, "off") != 0) {
+ fprintf(stderr, "qemu: '%s' invalid vnet_hdr option\n", buf);
+ return 0;
+ }
+ }
ret = -1;
- if (net_tap_fd_init(vlan, fd, 0))
+ if (net_tap_fd_init(vlan, fd, vnet_hdr))
ret = 0;
} else {
if (get_param_value(ifname, sizeof(ifname), "ifname", p) <= 0) {
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 11/12] kvm: qemu: Increase size of virtio_net rings
2008-08-11 20:12 ` [PATCH 10/12] kvm: qemu: Add a -net tap,fd=X,vnet_hdr=on option Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
2008-08-11 20:12 ` [PATCH 12/12] kvm: qemu: Drop the mutex while reading from tapfd Mark McLoughlin
2008-08-11 20:30 ` [PATCH 10/12] kvm: qemu: Add a -net tap,fd=X,vnet_hdr=on option Anthony Liguori
1 sibling, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
GSO packets uses up a lot more buffer entries, so double
the size of the rings
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/hw/virtio-net.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 2298316..61215b1 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -35,7 +35,7 @@
#define VIRTIO_NET_F_HOST_ECN 13 /* Host can handle TSO[6] w/ ECN in. */
#define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */
-#define TX_TIMER_INTERVAL 250000 /* 250 us */
+#define TX_TIMER_INTERVAL 150000 /* 150 us */
/* The config defining mac address (6 bytes) */
struct virtio_net_config
@@ -306,8 +306,8 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn)
n->vdev.update_config = virtio_net_update_config;
n->vdev.get_features = virtio_net_get_features;
n->vdev.set_features = virtio_net_set_features;
- n->rx_vq = virtio_add_queue(&n->vdev, 128, virtio_net_handle_rx);
- n->tx_vq = virtio_add_queue(&n->vdev, 128, virtio_net_handle_tx);
+ n->rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
+ n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx);
memcpy(n->mac, nd->macaddr, 6);
n->vc = qemu_new_vlan_client(nd->vlan, virtio_net_receive,
virtio_net_can_receive, n);
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 12/12] kvm: qemu: Drop the mutex while reading from tapfd
2008-08-11 20:12 ` [PATCH 11/12] kvm: qemu: Increase size of virtio_net rings Mark McLoughlin
@ 2008-08-11 20:12 ` Mark McLoughlin
0 siblings, 0 replies; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-11 20:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, Mark McLoughlin
The idea here is that with GSO, packets are much larger
and we can allow the vcpu threads to e.g. process irq
acks during the window where we're reading these
packets from the tapfd.
One known issue with this is that it triggers a subtle
SMP race in the kernel's posix-timers and signalfd code.
See here for more details and a test case:
http://lkml.org/lkml/2008/7/17/151
The symptoms of this are that:
a) occassionally throughput drops almost to zero
b) manually doing "killall -ALRM qemu-kvm" kicks qemu
out if its funk.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/vl.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/qemu/vl.c b/qemu/vl.c
index 499d6f9..d3ca0f1 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4489,7 +4489,9 @@ static void tap_send(void *opaque)
sbuf.buf = s->buf;
s->size = getmsg(s->fd, NULL, &sbuf, &f) >=0 ? sbuf.len : -1;
#else
+ kvm_sleep_begin();
s->size = read(s->fd, s->buf, sizeof(s->buf));
+ kvm_sleep_end();
#endif
if (s->size == -1 && errno == EINTR)
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 0/12] virtio_net perf patches
2008-08-11 20:12 [PATCH 0/12] virtio_net perf patches Mark McLoughlin
2008-08-11 20:12 ` [PATCH 01/12] kvm: qemu: Fix virtio_net tx timer Mark McLoughlin
@ 2008-08-11 20:30 ` Anthony Liguori
2008-08-12 18:12 ` Mark McLoughlin
2008-08-12 13:51 ` Avi Kivity
2 siblings, 1 reply; 35+ messages in thread
From: Anthony Liguori @ 2008-08-11 20:30 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: Avi Kivity, kvm
Mark McLoughlin wrote:
> Hi Avi,
> Here's the set of patches that I think make sense to apply.
>
We probably need to disable checksum offload on the RX side until we
figure out what to do about the broken dhclient. That's going to hit a
lot of users otherwise.
Regards,
Anthony Liguori
> I've left out Anthony's zero-copy patch since I wasn't seeing
> a measurable difference with it, it's quite invasive and isn't safe
> with the "drop the global mutex during tapfd read()" patch.
>
> I left that mutex patch 'til last since you may still be
> nervous about concurrency issues and also the posix-timers kernel race
> that it triggers.
>
> Another change since the first round of patches is that it
> took a bit of ugly hackery to handle the case where IFF_VNET_HDR
> is supported and we're using e.g. e1000. Since it's difficult to
> only enable IFF_VNET_HDR when we're using virtio_net_hdr, I went with
> this approach.
>
> Lastly, I need to add a "vnet_hdr=on" param to "-net tap" so
> that we can know if the supplied tap fd has IFF_VNET_HDR enabled.
> There's no interface to query that on the fd right now, and even if
> I did cook up a patch it probably wouldn't make 2.6.27.
>
> Cheers,
> Mark.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 10/12] kvm: qemu: Add a -net tap,fd=X,vnet_hdr=on option
2008-08-11 20:12 ` [PATCH 10/12] kvm: qemu: Add a -net tap,fd=X,vnet_hdr=on option Mark McLoughlin
2008-08-11 20:12 ` [PATCH 11/12] kvm: qemu: Increase size of virtio_net rings Mark McLoughlin
@ 2008-08-11 20:30 ` Anthony Liguori
1 sibling, 0 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-08-11 20:30 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: Avi Kivity, kvm
Mark McLoughlin wrote:
> Some management apps like libvirt creates a tap interface,
> adds it to a bridge and then passes the fd to qemu-kvm.
>
> Since we have no way of querying whether the IFF_VNET_HDR
> flag has been enabled on the interface, the app needs to
> tell us if has enabled it.
>
Sounds like we should just patch the kernel to add a mechanism to query
the IF_VNET_HDR flag.
Regards,
Anthony Liguori
> Signed-off-by: Mark McLoughlin <markmc@redhat.com>
> ---
> qemu/vl.c | 11 ++++++++++-
> 1 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/qemu/vl.c b/qemu/vl.c
> index 14a66e0..499d6f9 100644
> --- a/qemu/vl.c
> +++ b/qemu/vl.c
> @@ -5487,10 +5487,19 @@ int net_client_init(const char *device, const char *p)
> int fd;
> vlan->nb_host_devs++;
> if (get_param_value(buf, sizeof(buf), "fd", p) > 0) {
> + int vnet_hdr = 0;
> fd = strtol(buf, NULL, 0);
> fcntl(fd, F_SETFL, O_NONBLOCK);
> + if (get_param_value(buf, sizeof(buf), "vnet_hdr", p) > 0) {
> + if (strcmp(buf, "on") == 0) {
> + vnet_hdr = 1;
> + } else if (strcmp(buf, "off") != 0) {
> + fprintf(stderr, "qemu: '%s' invalid vnet_hdr option\n", buf);
> + return 0;
> + }
> + }
> ret = -1;
> - if (net_tap_fd_init(vlan, fd, 0))
> + if (net_tap_fd_init(vlan, fd, vnet_hdr))
> ret = 0;
> } else {
> if (get_param_value(ifname, sizeof(ifname), "ifname", p) <= 0) {
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/12] virtio_net perf patches
2008-08-11 20:12 [PATCH 0/12] virtio_net perf patches Mark McLoughlin
2008-08-11 20:12 ` [PATCH 01/12] kvm: qemu: Fix virtio_net tx timer Mark McLoughlin
2008-08-11 20:30 ` [PATCH 0/12] virtio_net perf patches Anthony Liguori
@ 2008-08-12 13:51 ` Avi Kivity
2008-08-12 14:55 ` Avi Kivity
2008-08-13 14:39 ` [PATCH 1/1] kvm: qemu: Handle tap fds with IFF_VNET_HDR Mark McLoughlin
2 siblings, 2 replies; 35+ messages in thread
From: Avi Kivity @ 2008-08-12 13:51 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: kvm
Mark McLoughlin wrote:
> Hi Avi,
> Here's the set of patches that I think make sense to apply.
>
> I've left out Anthony's zero-copy patch since I wasn't seeing
> a measurable difference with it, it's quite invasive and isn't safe
> with the "drop the global mutex during tapfd read()" patch.
>
> I left that mutex patch 'til last since you may still be
> nervous about concurrency issues and also the posix-timers kernel race
> that it triggers.
>
I'm nervous, but let's see what happens. Worst case we make it conditional.
> Another change since the first round of patches is that it
> took a bit of ugly hackery to handle the case where IFF_VNET_HDR
> is supported and we're using e.g. e1000. Since it's difficult to
> only enable IFF_VNET_HDR when we're using virtio_net_hdr, I went with
> this approach.
>
>
Thanks, applied all...
> Lastly, I need to add a "vnet_hdr=on" param to "-net tap" so
> that we can know if the supplied tap fd has IFF_VNET_HDR enabled.
> There's no interface to query that on the fd right now, and even if
> I did cook up a patch it probably wouldn't make 2.6.27.
>
except for this, as I agree with Anthony. I think it can make it into
2.6.27, but if it doesn't, libvirt should simply not enable vnet_hdr
unless it is sure qemu can query for the feature.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/12] virtio_net perf patches
2008-08-12 13:51 ` Avi Kivity
@ 2008-08-12 14:55 ` Avi Kivity
2008-08-13 14:39 ` [PATCH 1/1] kvm: qemu: Handle tap fds with IFF_VNET_HDR Mark McLoughlin
1 sibling, 0 replies; 35+ messages in thread
From: Avi Kivity @ 2008-08-12 14:55 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: kvm
Avi Kivity wrote:
>
> Thanks, applied all...
And reverted, since tap_receive_iov() doesn't compile on older hosts.
Please fix and resend.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr
2008-08-11 20:12 ` [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr Mark McLoughlin
2008-08-11 20:12 ` [PATCH 09/12] kvm: qemu: Actually enable GSO support Mark McLoughlin
@ 2008-08-12 17:41 ` Mark McLoughlin
2008-08-13 9:13 ` Avi Kivity
1 sibling, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-12 17:41 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm
Hi Avi,
Thanks for catching the build error in this one.
Here's a new (yet uglier) version; the rest remain the same.
Cheers,
Mark.
Subject: [PATCH 08/11] kvm: qemu: Don't require all drivers to use virtio_net_hdr
The virtio-net driver is the only one which wishes to deal
with virtio_net_hdr headers, so add a "using_vnet_hdr" flag
to allow it to indicate this.
Preferably, we'd prefer to only enable IFF_VNET_HDR when
we're using virtio-net, but qemu's various abstractions
would make this very messy.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/hw/virtio-net.c | 1 +
qemu/net.h | 1 +
qemu/vl.c | 87 +++++++++++++++++++++++++++++++++++++++++++------
3 files changed, 78 insertions(+), 11 deletions(-)
diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 9b1298e..2298316 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -97,6 +97,7 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev)
uint32_t features = (1 << VIRTIO_NET_F_MAC);
if (tap_has_vnet_hdr(host)) {
+ tap_using_vnet_hdr(host, 1);
features |= (1 << VIRTIO_NET_F_CSUM);
features |= (1 << VIRTIO_NET_F_GUEST_CSUM);
features |= (1 << VIRTIO_NET_F_GUEST_TSO4);
diff --git a/qemu/net.h b/qemu/net.h
index ae1a338..4891669 100644
--- a/qemu/net.h
+++ b/qemu/net.h
@@ -46,6 +46,7 @@ void qemu_handler_true(void *opaque);
void do_info_network(void);
int tap_has_vnet_hdr(void *opaque);
+void tap_using_vnet_hdr(void *opaque, int using_vnet_hdr);
int net_client_init(const char *device, const char *opts);
void net_client_uninit(NICInfo *nd);
diff --git a/qemu/vl.c b/qemu/vl.c
index f5aacf0..63e21f2 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4347,6 +4347,10 @@ int tap_has_vnet_hdr(void *opaque)
return 0;
}
+void tap_using_vnet_hdr(void *opaque, int using_vnet_hdr)
+{
+}
+
#else /* !defined(_WIN32) */
#ifndef IFF_VNET_HDR
@@ -4367,10 +4371,11 @@ typedef struct TAPState {
char buf[TAP_BUFSIZE];
int size;
unsigned int has_vnet_hdr : 1;
+ unsigned int using_vnet_hdr : 1;
} TAPState;
-static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov,
- int iovcnt)
+static ssize_t tap_writev(void *opaque, const struct iovec *iov,
+ int iovcnt)
{
TAPState *s = opaque;
ssize_t len;
@@ -4382,17 +4387,51 @@ static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov,
return len;
}
+static ssize_t tap_receive_iov(void *opaque, const struct iovec *iov,
+ int iovcnt)
+{
+#ifdef IFF_VNET_HDR
+ TAPState *s = opaque;
+
+ if (s->has_vnet_hdr && !s->using_vnet_hdr) {
+ struct iovec *iov_copy;
+ struct virtio_net_hdr hdr = { 0, };
+
+ iov_copy = alloca(sizeof(struct iovec) * (iovcnt + 1));
+
+ iov_copy[0].iov_base = &hdr;
+ iov_copy[0].iov_len = sizeof(hdr);
+
+ memcpy(&iov_copy[1], iov, sizeof(struct iovec) * iovcnt);
+
+ return tap_writev(opaque, iov_copy, iovcnt + 1);
+ }
+#endif
+
+ return tap_writev(opaque, iov, iovcnt);
+}
+
static void tap_receive(void *opaque, const uint8_t *buf, int size)
{
+ struct iovec iov[2];
+ int i = 0;
+
+#ifdef IFF_VNET_HDR
TAPState *s = opaque;
- int ret;
- for(;;) {
- ret = write(s->fd, buf, size);
- if (ret < 0 && (errno == EINTR || errno == EAGAIN)) {
- } else {
- break;
- }
+ struct virtio_net_hdr hdr = { 0, };
+
+ if (s->has_vnet_hdr && !s->using_vnet_hdr) {
+ iov[i].iov_base = &hdr;
+ iov[i].iov_len = sizeof(hdr);
+ i++;
}
+#endif
+
+ iov[i].iov_base = (char *) buf;
+ iov[i].iov_len = size;
+ i++;
+
+ tap_writev(opaque, iov, i);
}
static int tap_can_send(void *opaque)
@@ -4421,6 +4460,21 @@ static int tap_can_send(void *opaque)
return can_receive;
}
+static int tap_send_packet(TAPState *s)
+{
+ uint8_t *buf = s->buf;
+ int size = s->size;
+
+#ifdef IFF_VNET_HDR
+ if (s->has_vnet_hdr && !s->using_vnet_hdr) {
+ buf += sizeof(struct virtio_net_hdr);
+ size -= sizeof(struct virtio_net_hdr);
+ }
+#endif
+
+ return qemu_send_packet(s->vc, buf, size);
+}
+
static void tap_send(void *opaque)
{
TAPState *s = opaque;
@@ -4430,7 +4484,7 @@ static void tap_send(void *opaque)
int err;
/* If noone can receive the packet, buffer it */
- err = qemu_send_packet(s->vc, s->buf, s->size);
+ err = tap_send_packet(s);
if (err == -EAGAIN)
return;
}
@@ -4454,7 +4508,7 @@ static void tap_send(void *opaque)
int err;
/* If noone can receive the packet, buffer it */
- err = qemu_send_packet(s->vc, s->buf, s->size);
+ err = tap_send_packet(s);
if (err == -EAGAIN)
break;
}
@@ -4469,6 +4523,17 @@ int tap_has_vnet_hdr(void *opaque)
return s ? s->has_vnet_hdr : 0;
}
+void tap_using_vnet_hdr(void *opaque, int using_vnet_hdr)
+{
+ VLANClientState *vc = opaque;
+ TAPState *s = vc->opaque;
+
+ if (!s || !s->has_vnet_hdr)
+ return;
+
+ s->using_vnet_hdr = using_vnet_hdr != 0;
+}
+
#ifdef TUNSETOFFLOAD
static void tap_set_offload(VLANClientState *vc, int csum, int tso4, int tso6,
int ecn)
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 0/12] virtio_net perf patches
2008-08-11 20:30 ` [PATCH 0/12] virtio_net perf patches Anthony Liguori
@ 2008-08-12 18:12 ` Mark McLoughlin
2008-08-12 18:28 ` Anthony Liguori
2008-08-12 23:39 ` Herbert Xu
0 siblings, 2 replies; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-12 18:12 UTC (permalink / raw)
To: Herbert Xu; +Cc: Avi Kivity, kvm, Anthony Liguori
On Mon, 2008-08-11 at 15:30 -0500, Anthony Liguori wrote:
> Mark McLoughlin wrote:
> > Hi Avi,
> > Here's the set of patches that I think make sense to apply.
> >
>
> We probably need to disable checksum offload on the RX side until we
> figure out what to do about the broken dhclient. That's going to hit a
> lot of users otherwise.
If I could reproduce this, I'd get right on it ... but I'm not seeing
the issue here.
Wait, wait, wait. Bells are going off all of a sudden :-)
Yes, I've been through this before. See:
https://bugzilla.redhat.com/231444
So, we've had this long-standing dhclient patch in Fedora:
http://cvs.fedoraproject.org/viewcvs/*checkout*/rpms/dhcp/devel/dhcp-4.0.0-xen-checksum.patch
Herbert - any clue why this isn't upstream? That's quite surprising ...
Ah, I see Rusty moved this to linux-netdev without cc-ing:
http://marc.info/?l=linux-netdev&m=121844837826895
Herbert wrote:
"One easy way of doing this is to hook up the rx checksum offload
option in the guest with the tx offload option in the host. In
other words, when rx offload is enabled in the guest we enable
tx offload in the host, and disable it vice versa."
Are you basically just saying that guests with a "broken" dhclient
should manually disable rx checksum offload with ethtool? And that the
host should react by disabling tx offload on the tap interfacE?
Cheers,
Mark.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/12] virtio_net perf patches
2008-08-12 18:12 ` Mark McLoughlin
@ 2008-08-12 18:28 ` Anthony Liguori
2008-08-12 23:39 ` Herbert Xu
1 sibling, 0 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-08-12 18:28 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: Herbert Xu, Avi Kivity, kvm
Mark McLoughlin wrote:
> Yes, I've been through this before. See:
>
> https://bugzilla.redhat.com/231444
>
> So, we've had this long-standing dhclient patch in Fedora:
>
> http://cvs.fedoraproject.org/viewcvs/*checkout*/rpms/dhcp/devel/dhcp-4.0.0-xen-checksum.patch
>
> Herbert - any clue why this isn't upstream? That's quite surprising ...
>
I've confirmed that it's not in any of the upstream releases. However,
dhcp has a weird model where you have to pay a membership fee to have
access to the CVS repository or development mailing list?
I'm not sure what's the best way to proceed.
Regards,
Anthony Liguori
> Ah, I see Rusty moved this to linux-netdev without cc-ing:
>
> http://marc.info/?l=linux-netdev&m=121844837826895
>
> Herbert wrote:
>
> "One easy way of doing this is to hook up the rx checksum offload
> option in the guest with the tx offload option in the host. In
> other words, when rx offload is enabled in the guest we enable
> tx offload in the host, and disable it vice versa."
>
> Are you basically just saying that guests with a "broken" dhclient
> should manually disable rx checksum offload with ethtool? And that the
> host should react by disabling tx offload on the tap interfacE?
>
> Cheers,
> Mark.
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/12] virtio_net perf patches
2008-08-12 18:12 ` Mark McLoughlin
2008-08-12 18:28 ` Anthony Liguori
@ 2008-08-12 23:39 ` Herbert Xu
1 sibling, 0 replies; 35+ messages in thread
From: Herbert Xu @ 2008-08-12 23:39 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: Avi Kivity, kvm, Anthony Liguori
On Tue, Aug 12, 2008 at 07:12:28PM +0100, Mark McLoughlin wrote:
>
> Are you basically just saying that guests with a "broken" dhclient
> should manually disable rx checksum offload with ethtool? And that the
> host should react by disabling tx offload on the tap interfacE?
No I'm saying that everybody should default to no checksum offload.
Those that have working kernels + user-space can then enable it on
boot-up.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr
2008-08-12 17:41 ` [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr Mark McLoughlin
@ 2008-08-13 9:13 ` Avi Kivity
0 siblings, 0 replies; 35+ messages in thread
From: Avi Kivity @ 2008-08-13 9:13 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: kvm
Mark McLoughlin wrote:
> Hi Avi,
>
> Thanks for catching the build error in this one.
>
> Here's a new (yet uglier) version; the rest remain the same.
>
>
Thanks, reapplied the patches with the fix.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 1/1] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-12 13:51 ` Avi Kivity
2008-08-12 14:55 ` Avi Kivity
@ 2008-08-13 14:39 ` Mark McLoughlin
2008-08-13 16:24 ` Avi Kivity
1 sibling, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-13 14:39 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm
On Tue, 2008-08-12 at 16:51 +0300, Avi Kivity wrote:
> Mark McLoughlin wrote:
> > Lastly, I need to add a "vnet_hdr=on" param to "-net tap" so
> > that we can know if the supplied tap fd has IFF_VNET_HDR enabled.
> > There's no interface to query that on the fd right now, and even if
> > I did cook up a patch it probably wouldn't make 2.6.27.
> >
>
> except for this, as I agree with Anthony. I think it can make it into
> 2.6.27, but if it doesn't, libvirt should simply not enable vnet_hdr
> unless it is sure qemu can query for the feature.
Okay, I've just sent a patch to add TUNGETIFF:
http://marc.info/?l=linux-netdev&m=121863813904363
See below patch for how we'd use it. Don't apply this until the kernel
patch is accepted, obviously.
The -help string is what libvirt would use to detect whether qemu has
support for this; clearly libvirt would also need to detect the
availability of TUNGETIFF in the running kernel.
Cheers,
Mark.
Subject: [PATCH 1/1] kvm: qemu: Handle tap fds with IFF_VNET_HDR
Allow users to pass an IFF_VNET_HDR tap fd via "-net tap,fd=X"
by querying the fd with the newly added TUNGETIFF ioctl() to
see if IFF_VNET_HDR has been enabled.
Note: users wishing to pass an IFF_VNET_HDR tap fd to qemu
should check both that a) this version of qemu can handle
such an fd and b) that the TUNGETIFF ioctl() is available in
the running kernel.
We add a comment to the "qemu -help" output to indicate to
users that this version of qemu supports IFF_VNET_HDR.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/vl.c | 21 ++++++++++++++++++++-
1 files changed, 20 insertions(+), 1 deletions(-)
diff --git a/qemu/vl.c b/qemu/vl.c
index 2dc1311..b9b202e 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4536,6 +4536,22 @@ void tap_using_vnet_hdr(void *opaque, int using_vnet_hdr)
s->using_vnet_hdr = using_vnet_hdr != 0;
}
+static int tap_probe_vnet_hdr(int fd)
+{
+#if defined(TUNGETIFF) && defined(IFF_VNET_HDR)
+ struct ifreq ifr;
+
+ if (ioctl(fd, TUNGETIFF, &ifr) != 0) {
+ fprintf(stderr, "TUNGETIFF ioctl() failed: %s\n", strerror(errno));
+ return 0;
+ }
+
+ return ifr.ifr_flags & IFF_VNET_HDR;
+#else
+ return 0;
+#endif
+}
+
#ifdef TUNSETOFFLOAD
static void tap_set_offload(VLANClientState *vc, int csum, int tso4, int tso6,
int ecn)
@@ -5501,7 +5517,7 @@ int net_client_init(const char *device, const char *p)
fd = strtol(buf, NULL, 0);
fcntl(fd, F_SETFL, O_NONBLOCK);
ret = -1;
- if (net_tap_fd_init(vlan, fd, 0))
+ if (net_tap_fd_init(vlan, fd, tap_probe_vnet_hdr(fd)))
ret = 0;
} else {
if (get_param_value(ifname, sizeof(ifname), "ifname", p) <= 0) {
@@ -8319,6 +8335,9 @@ static void help(int exitcode)
" and 'dfile' (default=%s);\n"
" use '[down]script=no' to disable script execution;\n"
" use 'fd=h' to connect to an already opened TAP interface\n"
+#if defined(TUNGETIFF) && defined(IFF_VNET_HDR)
+ " 'h' may optinally be configured with the IFF_VNET_HDR flag\n"
+#endif
#endif
"-net socket[,vlan=n][,fd=h][,listen=[host]:port][,connect=host:port]\n"
" connect the vlan 'n' to another VLAN using a socket connection\n"
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 1/1] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-13 14:39 ` [PATCH 1/1] kvm: qemu: Handle tap fds with IFF_VNET_HDR Mark McLoughlin
@ 2008-08-13 16:24 ` Avi Kivity
2008-08-13 16:28 ` Daniel P. Berrange
2008-08-20 17:04 ` [PATCH] " Mark McLoughlin
0 siblings, 2 replies; 35+ messages in thread
From: Avi Kivity @ 2008-08-13 16:24 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: kvm
Mark McLoughlin wrote:
> Okay, I've just sent a patch to add TUNGETIFF:
>
> http://marc.info/?l=linux-netdev&m=121863813904363
>
> See below patch for how we'd use it. Don't apply this until the kernel
> patch is accepted, obviously.
>
>
Please repost once that's in.
> The -help string is what libvirt would use to detect whether qemu has
> support for this; clearly libvirt would also need to detect the
> availability of TUNGETIFF in the running kernel.
>
>
Neat trick. It would be nice to have a formal qemu capabilities query,
like we have for kvm (and a proper forwards and backwards compatible
qemu monitor protocol, and...)
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/1] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-13 16:24 ` Avi Kivity
@ 2008-08-13 16:28 ` Daniel P. Berrange
2008-08-20 17:04 ` [PATCH] " Mark McLoughlin
1 sibling, 0 replies; 35+ messages in thread
From: Daniel P. Berrange @ 2008-08-13 16:28 UTC (permalink / raw)
To: Avi Kivity; +Cc: Mark McLoughlin, kvm
On Wed, Aug 13, 2008 at 07:24:30PM +0300, Avi Kivity wrote:
> Mark McLoughlin wrote:
> >Okay, I've just sent a patch to add TUNGETIFF:
> >
> > http://marc.info/?l=linux-netdev&m=121863813904363
> >
> >See below patch for how we'd use it. Don't apply this until the kernel
> >patch is accepted, obviously.
> >
> >
>
> Please repost once that's in.
>
> >The -help string is what libvirt would use to detect whether qemu has
> >support for this; clearly libvirt would also need to detect the
> >availability of TUNGETIFF in the running kernel.
> >
> >
>
> Neat trick. It would be nice to have a formal qemu capabilities query,
> like we have for kvm (and a proper forwards and backwards compatible
> qemu monitor protocol, and...)
s/neat trick/sick hack/ but it has been working reasonably well so far. We
have maintained compat with QEMU from 0.8.0 onwards.
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-13 16:24 ` Avi Kivity
2008-08-13 16:28 ` Daniel P. Berrange
@ 2008-08-20 17:04 ` Mark McLoughlin
2008-08-20 17:09 ` Mark McLoughlin
2008-08-20 17:49 ` Anthony Liguori
1 sibling, 2 replies; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-20 17:04 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm
Allow users to pass an IFF_VNET_HDR tap fd via "-net tap,fd=X"
by querying the fd with the recently added TUNGETIFF ioctl() to
see if IFF_VNET_HDR has been enabled.
Note: users wishing to pass an IFF_VNET_HDR tap fd to qemu
should check both that a) this version of qemu can handle
such an fd and b) that the TUNGETIFF ioctl() is available in
the running kernel.
We add a comment to the "qemu -help" output to indicate to
users that this version of qemu supports IFF_VNET_HDR.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
qemu/vl.c | 21 ++++++++++++++++++++-
1 files changed, 20 insertions(+), 1 deletions(-)
diff --git a/qemu/vl.c b/qemu/vl.c
index 2dc1311..b9b202e 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4536,6 +4536,22 @@ void tap_using_vnet_hdr(void *opaque, int using_vnet_hdr)
s->using_vnet_hdr = using_vnet_hdr != 0;
}
+static int tap_probe_vnet_hdr(int fd)
+{
+#if defined(TUNGETIFF) && defined(IFF_VNET_HDR)
+ struct ifreq ifr;
+
+ if (ioctl(fd, TUNGETIFF, &ifr) != 0) {
+ fprintf(stderr, "TUNGETIFF ioctl() failed: %s\n", strerror(errno));
+ return 0;
+ }
+
+ return ifr.ifr_flags & IFF_VNET_HDR;
+#else
+ return 0;
+#endif
+}
+
#ifdef TUNSETOFFLOAD
static void tap_set_offload(VLANClientState *vc, int csum, int tso4, int tso6,
int ecn)
@@ -5501,7 +5517,7 @@ int net_client_init(const char *device, const char *p)
fd = strtol(buf, NULL, 0);
fcntl(fd, F_SETFL, O_NONBLOCK);
ret = -1;
- if (net_tap_fd_init(vlan, fd, 0))
+ if (net_tap_fd_init(vlan, fd, tap_probe_vnet_hdr(fd)))
ret = 0;
} else {
if (get_param_value(ifname, sizeof(ifname), "ifname", p) <= 0) {
@@ -8319,6 +8335,9 @@ static void help(int exitcode)
" and 'dfile' (default=%s);\n"
" use '[down]script=no' to disable script execution;\n"
" use 'fd=h' to connect to an already opened TAP interface\n"
+#if defined(TUNGETIFF) && defined(IFF_VNET_HDR)
+ " 'h' may optinally be configured with the IFF_VNET_HDR flag\n"
+#endif
#endif
"-net socket[,vlan=n][,fd=h][,listen=[host]:port][,connect=host:port]\n"
" connect the vlan 'n' to another VLAN using a socket connection\n"
--
1.5.4.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-20 17:04 ` [PATCH] " Mark McLoughlin
@ 2008-08-20 17:09 ` Mark McLoughlin
2008-08-20 17:27 ` Avi Kivity
2008-08-20 17:49 ` Anthony Liguori
1 sibling, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-20 17:09 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm
On Wed, 2008-08-20 at 18:05 +0100, Mark McLoughlin wrote:
> Allow users to pass an IFF_VNET_HDR tap fd via "-net tap,fd=X"
> by querying the fd with the recently added TUNGETIFF ioctl() to
> see if IFF_VNET_HDR has been enabled.
This was merged by Linus today:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e3b99556975907530aeb9745e7b3945a0da48f17
I would have included that link in the previous mail, but I see another
one of my "patch appended" mails ended up being quoted fully in the
commit message:
http://git.kernel.org/?p=virt/kvm/kvm-userspace.git;a=commit;h=db424ac3052fe9392e3287b6d8cbfa9e9706cecd
Other people seem to manage to avoid this, but trying it out myself with
git-am the best I can come up with is to run "git-am -i" and edit the
commit message, which sucks.
Are others using some other mbox-to-git-commits script, I wonder?
Cheers,
Mark.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-20 17:09 ` Mark McLoughlin
@ 2008-08-20 17:27 ` Avi Kivity
0 siblings, 0 replies; 35+ messages in thread
From: Avi Kivity @ 2008-08-20 17:27 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: kvm
Mark McLoughlin wrote:
> I would have included that link in the previous mail, but I see another
> one of my "patch appended" mails ended up being quoted fully in the
> commit message:
>
> http://git.kernel.org/?p=virt/kvm/kvm-userspace.git;a=commit;h=db424ac3052fe9392e3287b6d8cbfa9e9706cecd
>
> Other people seem to manage to avoid this, but trying it out myself with
> git-am the best I can come up with is to run "git-am -i" and edit the
> commit message, which sucks.
>
> Are others using some other mbox-to-git-commits script, I wonder?
>
>
No -- that was my error; I usually edit the commit messages before
committing.
In this particular case, there was no need to edit any of the patches
(except the updated one), which caused me to miss out. I'll keep a
better lookup next time.
Oh, and there's some trick with triple-dash. If you do
> log message
>
> ---
>
> junk
>
> patch
Then git-am will ignore junk and use just the log message. That's how
the diffstats escape.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-20 17:04 ` [PATCH] " Mark McLoughlin
2008-08-20 17:09 ` Mark McLoughlin
@ 2008-08-20 17:49 ` Anthony Liguori
2008-08-20 17:51 ` Avi Kivity
2008-08-20 18:11 ` Daniel P. Berrange
1 sibling, 2 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-08-20 17:49 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: Avi Kivity, kvm
Mark McLoughlin wrote:
> Allow users to pass an IFF_VNET_HDR tap fd via "-net tap,fd=X"
> by querying the fd with the recently added TUNGETIFF ioctl() to
> see if IFF_VNET_HDR has been enabled.
>
> Note: users wishing to pass an IFF_VNET_HDR tap fd to qemu
> should check both that a) this version of qemu can handle
> such an fd and b) that the TUNGETIFF ioctl() is available in
> the running kernel.
>
> We add a comment to the "qemu -help" output to indicate to
> users that this version of qemu supports IFF_VNET_HDR.
>
The -help output is not a supported interface. An info command in the
monitor would be a better way to detect this.
Regards,
Anthony Liguori
> Signed-off-by: Mark McLoughlin <markmc@redhat.com>
> ---
> qemu/vl.c | 21 ++++++++++++++++++++-
> 1 files changed, 20 insertions(+), 1 deletions(-)
>
> diff --git a/qemu/vl.c b/qemu/vl.c
> index 2dc1311..b9b202e 100644
> --- a/qemu/vl.c
> +++ b/qemu/vl.c
> @@ -4536,6 +4536,22 @@ void tap_using_vnet_hdr(void *opaque, int using_vnet_hdr)
> s->using_vnet_hdr = using_vnet_hdr != 0;
> }
>
> +static int tap_probe_vnet_hdr(int fd)
> +{
> +#if defined(TUNGETIFF) && defined(IFF_VNET_HDR)
> + struct ifreq ifr;
> +
> + if (ioctl(fd, TUNGETIFF, &ifr) != 0) {
> + fprintf(stderr, "TUNGETIFF ioctl() failed: %s\n", strerror(errno));
> + return 0;
> + }
> +
> + return ifr.ifr_flags & IFF_VNET_HDR;
> +#else
> + return 0;
> +#endif
> +}
> +
> #ifdef TUNSETOFFLOAD
> static void tap_set_offload(VLANClientState *vc, int csum, int tso4, int tso6,
> int ecn)
> @@ -5501,7 +5517,7 @@ int net_client_init(const char *device, const char *p)
> fd = strtol(buf, NULL, 0);
> fcntl(fd, F_SETFL, O_NONBLOCK);
> ret = -1;
> - if (net_tap_fd_init(vlan, fd, 0))
> + if (net_tap_fd_init(vlan, fd, tap_probe_vnet_hdr(fd)))
> ret = 0;
> } else {
> if (get_param_value(ifname, sizeof(ifname), "ifname", p) <= 0) {
> @@ -8319,6 +8335,9 @@ static void help(int exitcode)
> " and 'dfile' (default=%s);\n"
> " use '[down]script=no' to disable script execution;\n"
> " use 'fd=h' to connect to an already opened TAP interface\n"
> +#if defined(TUNGETIFF) && defined(IFF_VNET_HDR)
> + " 'h' may optinally be configured with the IFF_VNET_HDR flag\n"
> +#endif
> #endif
> "-net socket[,vlan=n][,fd=h][,listen=[host]:port][,connect=host:port]\n"
> " connect the vlan 'n' to another VLAN using a socket connection\n"
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-20 17:49 ` Anthony Liguori
@ 2008-08-20 17:51 ` Avi Kivity
2008-08-20 18:01 ` Anthony Liguori
2008-08-21 9:30 ` Mark McLoughlin
2008-08-20 18:11 ` Daniel P. Berrange
1 sibling, 2 replies; 35+ messages in thread
From: Avi Kivity @ 2008-08-20 17:51 UTC (permalink / raw)
To: Anthony Liguori; +Cc: Mark McLoughlin, kvm
Anthony Liguori wrote:
> Mark McLoughlin wrote:
>> Allow users to pass an IFF_VNET_HDR tap fd via "-net tap,fd=X"
>> by querying the fd with the recently added TUNGETIFF ioctl() to
>> see if IFF_VNET_HDR has been enabled.
>>
>> Note: users wishing to pass an IFF_VNET_HDR tap fd to qemu
>> should check both that a) this version of qemu can handle
>> such an fd and b) that the TUNGETIFF ioctl() is available in
>> the running kernel.
>>
>> We add a comment to the "qemu -help" output to indicate to
>> users that this version of qemu supports IFF_VNET_HDR.
>>
>
> The -help output is not a supported interface. An info command in the
> monitor would be a better way to detect this.
>
You need to know this before you create a virtual machine. I suggested
a 'qemu -capabilities' thing to describe what this qemu can do.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-20 17:51 ` Avi Kivity
@ 2008-08-20 18:01 ` Anthony Liguori
2008-08-21 9:30 ` Mark McLoughlin
1 sibling, 0 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-08-20 18:01 UTC (permalink / raw)
To: Avi Kivity; +Cc: Mark McLoughlin, kvm
Avi Kivity wrote:
> Anthony Liguori wrote:
>
>> Mark McLoughlin wrote:
>>
>>> Allow users to pass an IFF_VNET_HDR tap fd via "-net tap,fd=X"
>>> by querying the fd with the recently added TUNGETIFF ioctl() to
>>> see if IFF_VNET_HDR has been enabled.
>>>
>>> Note: users wishing to pass an IFF_VNET_HDR tap fd to qemu
>>> should check both that a) this version of qemu can handle
>>> such an fd and b) that the TUNGETIFF ioctl() is available in
>>> the running kernel.
>>>
>>> We add a comment to the "qemu -help" output to indicate to
>>> users that this version of qemu supports IFF_VNET_HDR.
>>>
>>>
>> The -help output is not a supported interface. An info command in the
>> monitor would be a better way to detect this.
>>
>>
>
> You need to know this before you create a virtual machine. I suggested
> a 'qemu -capabilities' thing to describe what this qemu can do.
>
$ echo -e 'info version\nquit' | qemu-system-x86_64 -S -hda /dev/null
-monitor stdio -vnc none
Okay, a little hacky, but so are a lot of things that libvirt is doing :-)
A proper -version flag that also has a bitmap of 'features' may be
reasonable too.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-20 17:49 ` Anthony Liguori
2008-08-20 17:51 ` Avi Kivity
@ 2008-08-20 18:11 ` Daniel P. Berrange
1 sibling, 0 replies; 35+ messages in thread
From: Daniel P. Berrange @ 2008-08-20 18:11 UTC (permalink / raw)
To: Anthony Liguori; +Cc: Mark McLoughlin, Avi Kivity, kvm
On Wed, Aug 20, 2008 at 12:49:23PM -0500, Anthony Liguori wrote:
> Mark McLoughlin wrote:
> >Allow users to pass an IFF_VNET_HDR tap fd via "-net tap,fd=X"
> >by querying the fd with the recently added TUNGETIFF ioctl() to
> >see if IFF_VNET_HDR has been enabled.
> >
> >Note: users wishing to pass an IFF_VNET_HDR tap fd to qemu
> >should check both that a) this version of qemu can handle
> >such an fd and b) that the TUNGETIFF ioctl() is available in
> >the running kernel.
> >
> >We add a comment to the "qemu -help" output to indicate to
> >users that this version of qemu supports IFF_VNET_HDR.
> >
>
> The -help output is not a supported interface. An info command in the
> monitor would be a better way to detect this.
The reason mark suggested -help is that libvirt already looks at -help
output to determine what command line args are supported. Whether its
stable or not, its the only way to determine supported args. Hopefully
in the future we can just look for presence of -config, and then just
use a config file to launch QEMU, so we'll have minimal dependance on
this data 'format' for -help :-)
The IFF_VNET_HDR stuff is a little different - its more a capability
of the virtualization engine that a startup option. I'd welcome some
kind of way to query capabilities in general, such as the -capabilites
arg suggested. This would include stuff like the list of supported
NIC models, supported disk backends, supported disk formats, sound
hardware, etc.
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-20 17:51 ` Avi Kivity
2008-08-20 18:01 ` Anthony Liguori
@ 2008-08-21 9:30 ` Mark McLoughlin
2008-08-21 13:55 ` Avi Kivity
1 sibling, 1 reply; 35+ messages in thread
From: Mark McLoughlin @ 2008-08-21 9:30 UTC (permalink / raw)
To: Avi Kivity; +Cc: Anthony Liguori, kvm
On Wed, 2008-08-20 at 20:51 +0300, Avi Kivity wrote:
> Anthony Liguori wrote:
> > Mark McLoughlin wrote:
> >> Allow users to pass an IFF_VNET_HDR tap fd via "-net tap,fd=X"
> >> by querying the fd with the recently added TUNGETIFF ioctl() to
> >> see if IFF_VNET_HDR has been enabled.
> >>
> >> Note: users wishing to pass an IFF_VNET_HDR tap fd to qemu
> >> should check both that a) this version of qemu can handle
> >> such an fd and b) that the TUNGETIFF ioctl() is available in
> >> the running kernel.
> >>
> >> We add a comment to the "qemu -help" output to indicate to
> >> users that this version of qemu supports IFF_VNET_HDR.
> >>
> >
> > The -help output is not a supported interface. An info command in the
> > monitor would be a better way to detect this.
> >
>
> You need to know this before you create a virtual machine. I suggested
> a 'qemu -capabilities' thing to describe what this qemu can do.
I'll take a stab at "-capabilities" later.
In the mean time, it may make sense to commit the patch minus the -help
string.
Cheers,
Mark.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-21 9:30 ` Mark McLoughlin
@ 2008-08-21 13:55 ` Avi Kivity
2008-08-21 13:58 ` Anthony Liguori
0 siblings, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2008-08-21 13:55 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: Anthony Liguori, kvm
Mark McLoughlin wrote:
> I'll take a stab at "-capabilities" later.
>
>
It's quite tricky; anything we forget now, we won't have a way to detect
in the future, so it needs to be fairly complete.
> In the mean time, it may make sense to commit the patch minus the -help
> string.
>
>
>
Done.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] kvm: qemu: Handle tap fds with IFF_VNET_HDR
2008-08-21 13:55 ` Avi Kivity
@ 2008-08-21 13:58 ` Anthony Liguori
0 siblings, 0 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-08-21 13:58 UTC (permalink / raw)
To: Avi Kivity; +Cc: Mark McLoughlin, kvm
Avi Kivity wrote:
> Mark McLoughlin wrote:
>> I'll take a stab at "-capabilities" later.
>>
>>
>
> It's quite tricky; anything we forget now, we won't have a way to
> detect in the future, so it needs to be fairly complete.
BTW, I prefer that it's part of the output from -version.
Regards,
Anthony Liguori
>> In the mean time, it may make sense to commit the patch minus the -help
>> string.
>>
>>
>>
>
> Done.
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2008-08-21 13:59 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-11 20:12 [PATCH 0/12] virtio_net perf patches Mark McLoughlin
2008-08-11 20:12 ` [PATCH 01/12] kvm: qemu: Fix virtio_net tx timer Mark McLoughlin
2008-08-11 20:12 ` [PATCH 02/12] kvm: qemu: Remove virtio_net tx ring-full heuristic Mark McLoughlin
2008-08-11 20:12 ` [PATCH 03/12] kvm: qemu: Add VIRTIO_F_NOTIFY_ON_EMPTY Mark McLoughlin
2008-08-11 20:12 ` [PATCH 04/12] kvm: qemu: Disable recv notifications until avail buffers exhausted Mark McLoughlin
2008-08-11 20:12 ` [PATCH 05/12] kvm: qemu: Add support for partial csums and GSO Mark McLoughlin
2008-08-11 20:12 ` [PATCH 06/12] kvm: qemu: Rename tap_readv() to tap_receive_iov() Mark McLoughlin
2008-08-11 20:12 ` [PATCH 07/12] kvm: qemu: Move some code around for the next commit Mark McLoughlin
2008-08-11 20:12 ` [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr Mark McLoughlin
2008-08-11 20:12 ` [PATCH 09/12] kvm: qemu: Actually enable GSO support Mark McLoughlin
2008-08-11 20:12 ` [PATCH 10/12] kvm: qemu: Add a -net tap,fd=X,vnet_hdr=on option Mark McLoughlin
2008-08-11 20:12 ` [PATCH 11/12] kvm: qemu: Increase size of virtio_net rings Mark McLoughlin
2008-08-11 20:12 ` [PATCH 12/12] kvm: qemu: Drop the mutex while reading from tapfd Mark McLoughlin
2008-08-11 20:30 ` [PATCH 10/12] kvm: qemu: Add a -net tap,fd=X,vnet_hdr=on option Anthony Liguori
2008-08-12 17:41 ` [PATCH 08/12] kvm: qemu: Don't require all drivers to use virtio_net_hdr Mark McLoughlin
2008-08-13 9:13 ` Avi Kivity
2008-08-11 20:30 ` [PATCH 0/12] virtio_net perf patches Anthony Liguori
2008-08-12 18:12 ` Mark McLoughlin
2008-08-12 18:28 ` Anthony Liguori
2008-08-12 23:39 ` Herbert Xu
2008-08-12 13:51 ` Avi Kivity
2008-08-12 14:55 ` Avi Kivity
2008-08-13 14:39 ` [PATCH 1/1] kvm: qemu: Handle tap fds with IFF_VNET_HDR Mark McLoughlin
2008-08-13 16:24 ` Avi Kivity
2008-08-13 16:28 ` Daniel P. Berrange
2008-08-20 17:04 ` [PATCH] " Mark McLoughlin
2008-08-20 17:09 ` Mark McLoughlin
2008-08-20 17:27 ` Avi Kivity
2008-08-20 17:49 ` Anthony Liguori
2008-08-20 17:51 ` Avi Kivity
2008-08-20 18:01 ` Anthony Liguori
2008-08-21 9:30 ` Mark McLoughlin
2008-08-21 13:55 ` Avi Kivity
2008-08-21 13:58 ` Anthony Liguori
2008-08-20 18:11 ` Daniel P. Berrange
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox