From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39669) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aoRox-0005eI-Ud for qemu-devel@nongnu.org; Fri, 08 Apr 2016 04:31:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aoRos-0005PB-VG for qemu-devel@nongnu.org; Fri, 08 Apr 2016 04:31:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48007) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aoRos-0005Oo-P1 for qemu-devel@nongnu.org; Fri, 08 Apr 2016 04:31:10 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3D99A80F6B for ; Fri, 8 Apr 2016 08:31:10 +0000 (UTC) References: <1459711556-10273-1-git-send-email-wexu@redhat.com> <1459711556-10273-3-git-send-email-wexu@redhat.com> <57032731.8060907@redhat.com> <5707622D.7010307@redhat.com> From: Jason Wang Message-ID: <57076C49.8000905@redhat.com> Date: Fri, 8 Apr 2016 16:31:05 +0800 MIME-Version: 1.0 In-Reply-To: <5707622D.7010307@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wei Xu , qemu-devel@nongnu.org On 04/08/2016 03:47 PM, Wei Xu wrote: > > > On 2016=E5=B9=B404=E6=9C=8805=E6=97=A5 10:47, Jason Wang wrote: >> >> On 04/04/2016 03:25 AM, wexu@redhat.com wrote: >>> From: Wei Xu >>> >>> All the data packets in a tcp connection will be cached to a big buff= er >>> in every receive interval, and will be sent out via a timer, the >>> 'virtio_net_rsc_timeout' controls the interval, the value will >>> influent the >>> performance and response of tcp connection extremely, 50000(50us) is = a >>> experience value to gain a performance improvement, since the whql te= st >>> sends packets every 100us, so '300000(300us)' can pass the test case, >>> this is also the default value, it's gonna to be tunable. >>> >>> The timer will only be triggered if the packets pool is not empty, >>> and it'll drain off all the cached packets >>> >>> 'NetRscChain' is used to save the segments of different protocols in = a >>> VirtIONet device. >>> >>> The main handler of TCP includes TCP window update, duplicated ACK >>> check >>> and the real data coalescing if the new segment passed sanity check >>> and is identified as an 'wanted' one. >>> >>> An 'wanted' segment means: >>> 1. Segment is within current window and the sequence is the expected >>> one. >>> 2. ACK of the segment is in the valid window. >>> 3. If the ACK in the segment is a duplicated one, then it must less >>> than 2, >>> this is to notify upper layer TCP starting retransmission due to >>> the spec. >>> >>> Sanity check includes: >>> 1. Incorrect version in IP header >>> 2. IP options & IP fragment >>> 3. Not a TCP packets >>> 4. Sanity size check to prevent buffer overflow attack. >>> >>> There maybe more cases should be considered such as ip >>> identification other >>> flags, while it broke the test because windows set it to the same >>> even it's >>> not a fragment. >>> >>> Normally it includes 2 typical ways to handle a TCP control flag, >>> 'bypass' >>> and 'finalize', 'bypass' means should be sent out directly, and >>> 'finalize' >>> means the packets should also be bypassed, and this should be done >>> after searching for the same connection packets in the pool and sendi= ng >>> all of them out, this is to avoid out of data. >>> >>> All the 'SYN' packets will be bypassed since this always begin a new' >>> connection, other flags such 'FIN/RST' will trigger a finalization, >>> because >>> this normally happens upon a connection is going to be closed, an >>> 'URG' packet >>> also finalize current coalescing unit. >>> >>> Statistics can be used to monitor the basic coalescing status, the >>> 'out of order' >>> and 'out of window' means how many retransmitting packets, thus >>> describe the >>> performance intuitively. >>> >>> Signed-off-by: Wei Xu >>> --- >>> hw/net/virtio-net.c | 480 >>> ++++++++++++++++++++++++++++++++++++++++- >>> include/hw/virtio/virtio-net.h | 1 + >>> include/hw/virtio/virtio.h | 72 +++++++ >>> 3 files changed, 552 insertions(+), 1 deletion(-) >>> >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>> index bd91a4b..81e8e71 100644 >>> --- a/hw/net/virtio-net.c >>> +++ b/hw/net/virtio-net.c >>> @@ -15,10 +15,12 @@ >>> #include "qemu/iov.h" >>> #include "hw/virtio/virtio.h" >>> #include "net/net.h" >>> +#include "net/eth.h" >>> #include "net/checksum.h" >>> #include "net/tap.h" >>> #include "qemu/error-report.h" >>> #include "qemu/timer.h" >>> +#include "qemu/sockets.h" >>> #include "hw/virtio/virtio-net.h" >>> #include "net/vhost_net.h" >>> #include "hw/virtio/virtio-bus.h" >>> @@ -38,6 +40,24 @@ >>> #define endof(container, field) \ >>> (offsetof(container, field) + sizeof(((container *)0)->field)) >>> +#define VIRTIO_NET_IP4_ADDR_SIZE 8 /* ipv4 saddr + daddr = */ >>> +#define VIRTIO_NET_TCP_PORT_SIZE 4 /* sport + dport */ >>> + >>> +/* IPv4 max payload, 16 bits in the header */ >>> +#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header)= ) >>> +#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535 >>> + >>> +/* header lenght value in ip header without option */ >> typo here. > Thanks. >> >>> +#define VIRTIO_NET_IP4_HEADER_LENGTH 5 >>> + >>> +/* Purge coalesced packets timer interval */ >>> +#define VIRTIO_NET_RSC_INTERVAL 300000 >>> + >>> +/* This value affects the performance a lot, and should be tuned >>> carefully, >>> + '300000'(300us) is the recommended value to pass the WHQL test, >>> '50000' can >>> + gain 2x netperf throughput with tso/gso/gro 'off'. */ >>> +static uint32_t virtio_net_rsc_timeout =3D VIRTIO_NET_RSC_INTERVAL; >> Like we've discussed in previous versions, need we add another propert= y >> for this? > Do you know how to make this a tunable parameter to guest? can this > parameter be set via control queue? It's possible I think. [...] >>> + >>> +static void virtio_net_rsc_purge(void *opq) >>> +{ >>> + NetRscChain *chain =3D (NetRscChain *)opq; >>> + NetRscSeg *seg, *rn; >>> + >>> + QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) { >>> + if (virtio_net_rsc_drain_seg(chain, seg) =3D=3D 0) { >>> + chain->stat.purge_failed++; >>> + continue; >> Is it better to break here, consider we fail to do the receive? > Actually this fails only when receive fails according to the test, but > should drain all the segments here, > if break here, then will have to free all the buffers, is it possible > a successful receiving followed by a failed one? I'd say it's possible, e.g guest just finish rx buffer refills. > if that does happens, i preferred give it a shot other than free it > directly. Ok. [...] >>> + >>> +static void virtio_net_rsc_cache_buf(NetRscChain *chain, >>> NetClientState *nc, >>> + const uint8_t *buf, size_t size= ) >>> +{ >>> + uint16_t hdr_len; >>> + NetRscSeg *seg; >>> + >>> + hdr_len =3D ((VirtIONet *)(chain->n))->guest_hdr_len; >>> + seg =3D g_malloc(sizeof(NetRscSeg)); >>> + seg->buf =3D g_malloc(hdr_len + sizeof(struct eth_header)\ >>> + + sizeof(struct ip6_header) + >>> VIRTIO_NET_MAX_TCP_PAYLOAD); >> Why ip6_header here? > ipv6 packets can carry 65535(2^16) bytes pure data payload, that means > the header is not included. > but ipv4 is included, i choose the maximum here. Let's add a comment here, since the patch's title is to implement ipv4 coalescing. [...] >>> + >>> +static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain, >>> NetRscSeg *seg, >>> + const uint8_t *buf, size_t size, NetRscUnit >>> *unit) >> indentation is wrong here. > Should it be aligned from parameter begin? Looks like we'd better do this. > i was going to save an extra line parameter. [...]