From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH 5/5] kvm: qemu: Improve virtio_net recv buffer allocation scheme Date: Sun, 12 Oct 2008 12:00:26 +0200 Message-ID: <48F1CABA.8010301@redhat.com> References: <> <1223494513-18826-1-git-send-email-markmc@redhat.com> <1223494513-18826-2-git-send-email-markmc@redhat.com> <1223494513-18826-3-git-send-email-markmc@redhat.com> <1223494513-18826-4-git-send-email-markmc@redhat.com> <1223494513-18826-5-git-send-email-markmc@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Rusty Russell , Herbert Xu , Anthony Liguori To: Mark McLoughlin Return-path: Received: from mx2.redhat.com ([66.187.237.31]:40591 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751456AbYJLKAn (ORCPT ); Sun, 12 Oct 2008 06:00:43 -0400 In-Reply-To: <1223494513-18826-5-git-send-email-markmc@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Mark McLoughlin wrote: > From: Herbert Xu > > Currently, in order to receive large packets, the guest must allocate > max-sized packet buffers and pass them to the host. Each of these > max-sized packets occupy 20 ring entries, which means we can only > transfer a maximum of 12 packets in a single batch with a 256 entry > ring. > > When receiving packets from external networks, we only receive MTU > sized packets and so the throughput observed is throttled by the > number of packets the ring can hold. > > Implement the VIRTIO_NET_F_MRG_RXBUF feature to let guests know that > we can merge smaller buffers together in order to handle large packets. > > This scheme allows us to be efficient in our use of ring entries > while still supporting large packets. Benchmarking using netperf from > an external machine to a guest over a 10Gb/s network shows a 100% > improvement from ~1Gb/s to ~2Gb/s. With a local host->guest benchmark > with GSO disabled on the host side, throughput was seen to increase > from 700Mb/s to 1.7Gb/s. > > Based on a patch from Herbert, with the feature renamed from > "datahead" and some re-factoring for readability. > > > diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c > index 403247b..afa5fe5 100644 > --- a/qemu/hw/virtio-net.c > +++ b/qemu/hw/virtio-net.c > @@ -34,9 +34,13 @@ > #define VIRTIO_NET_F_HOST_TSO6 12 /* Host can handle TSOv6 in. */ > #define VIRTIO_NET_F_HOST_ECN 13 /* Host can handle TSO[6] w/ ECN in. */ > #define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */ > +#define VIRTIO_NET_F_MRG_RXBUF 15 /* Host can merge receive buffers. */ > > What's the status of the guest side of this feature? > #define TX_TIMER_INTERVAL 150000 /* 150 us */ > > +/* Should be the largest MAX_SKB_FRAGS supported. */ > +#define VIRTIO_NET_MAX_FRAGS 18 > + > This should be advertised by the host to the guest (or vice-versa?). We're embedding Linux-specific magic numbers in a guest-OS-agnostic ABI. Perfereably, there shouldn't be a limit at all. > @@ -209,7 +220,12 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) > if (virtqueue_pop(n->rx_vq, &elem) == 0) > return; > > - if (elem.in_num < 1 || elem.in_sg[0].iov_len != sizeof(*hdr)) { > + if (n->mergeable_rx_bufs) { > + if (elem.in_num < 1 || elem.in_sg[0].iov_len < TARGET_PAGE_SIZE) { > + fprintf(stderr, "virtio-net IOV is irregular\n"); > + exit(1); > + } > Again, this is burying details of the current Linux stack into the ABI. The Linux stack may change not to be page oriented, or maybe this won't fit will to how Windows views things. Can this be made not to depend on the size of the iov elements? > + } else if (elem.in_num < 1 || elem.in_sg[0].iov_len != sizeof(*hdr)) { > fprintf(stderr, "virtio-net header not in first element\n"); > exit(1); > } > @@ -229,11 +245,49 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) > } > > /* copy in packet. ugh */ > - iov_fill(&elem.in_sg[1], elem.in_num - 1, > - buf + offset, size - offset); > > - /* signal other side */ > - virtqueue_push(n->rx_vq, &elem, total); > + if (n->mergeable_rx_bufs) { > + int i = 0; > + > + elem.in_sg[0].iov_base += sizeof(*hdr); > + elem.in_sg[0].iov_len -= sizeof(*hdr); > + > + offset += iov_fill(&elem.in_sg[0], elem.in_num, > + buf + offset, size - offset); > + > + /* signal other side */ > + virtqueue_fill(n->rx_vq, &elem, total, i++); > + > + while (offset < size) { > + int len; > + > + if (virtqueue_pop(n->rx_vq, &elem) == 0) { > + fprintf(stderr, "virtio-net truncating packet\n"); > + exit(1); > + } > + > + if (elem.in_num < 1 || elem.in_sg[0].iov_len < TARGET_PAGE_SIZE) { > + fprintf(stderr, "virtio-net IOV is irregular\n"); > + exit(1); > + } > + > + len = iov_fill(&elem.in_sg[0], elem.in_num, > + buf + offset, size - offset); > + > + virtqueue_fill(n->rx_vq, &elem, len, i++); > + > + offset += len; > + } > + > + virtqueue_flush(n->rx_vq, i); > + } else { > + iov_fill(&elem.in_sg[1], elem.in_num - 1, > + buf + offset, size - offset); > + > + /* signal other side */ > + virtqueue_push(n->rx_vq, &elem, total); > + } > + > Can we merge the two sides of the if () so that the only difference is the number of times we go through the loop? Anthony, please review this as well, my virtio-foo is pretty superficial. -- error compiling committee.c: too many arguments to function