From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48185) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aoSVG-0003yr-9O for qemu-devel@nongnu.org; Fri, 08 Apr 2016 05:14:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aoSVB-0006yP-7c for qemu-devel@nongnu.org; Fri, 08 Apr 2016 05:14:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46651) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aoSVA-0006yJ-Vn for qemu-devel@nongnu.org; Fri, 08 Apr 2016 05:14:53 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5289D65410 for ; Fri, 8 Apr 2016 09:14:52 +0000 (UTC) Received: from wei-thinkpad.nay.redhat.com (vpn1-4-44.pek2.redhat.com [10.72.4.44]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u389EnlD032469 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 8 Apr 2016 05:14:51 -0400 References: <1459711556-10273-1-git-send-email-wexu@redhat.com> <1459711556-10273-3-git-send-email-wexu@redhat.com> <57032731.8060907@redhat.com> <5707622D.7010307@redhat.com> <57076C49.8000905@redhat.com> From: Wei Xu Message-ID: <57077696.1020701@redhat.com> Date: Fri, 8 Apr 2016 17:15:02 +0800 MIME-Version: 1.0 In-Reply-To: <57076C49.8000905@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org On 2016=E5=B9=B404=E6=9C=8808=E6=97=A5 16:31, Jason Wang wrote: > > On 04/08/2016 03:47 PM, Wei Xu wrote: >> >> On 2016=E5=B9=B404=E6=9C=8805=E6=97=A5 10:47, Jason Wang wrote: >>> On 04/04/2016 03:25 AM, wexu@redhat.com wrote: >>>> From: Wei Xu >>>> >>>> All the data packets in a tcp connection will be cached to a big buf= fer >>>> in every receive interval, and will be sent out via a timer, the >>>> 'virtio_net_rsc_timeout' controls the interval, the value will >>>> influent the >>>> performance and response of tcp connection extremely, 50000(50us) is= a >>>> experience value to gain a performance improvement, since the whql t= est >>>> sends packets every 100us, so '300000(300us)' can pass the test case= , >>>> this is also the default value, it's gonna to be tunable. >>>> >>>> The timer will only be triggered if the packets pool is not empty, >>>> and it'll drain off all the cached packets >>>> >>>> 'NetRscChain' is used to save the segments of different protocols in= a >>>> VirtIONet device. >>>> >>>> The main handler of TCP includes TCP window update, duplicated ACK >>>> check >>>> and the real data coalescing if the new segment passed sanity check >>>> and is identified as an 'wanted' one. >>>> >>>> An 'wanted' segment means: >>>> 1. Segment is within current window and the sequence is the expected >>>> one. >>>> 2. ACK of the segment is in the valid window. >>>> 3. If the ACK in the segment is a duplicated one, then it must less >>>> than 2, >>>> this is to notify upper layer TCP starting retransmission due t= o >>>> the spec. >>>> >>>> Sanity check includes: >>>> 1. Incorrect version in IP header >>>> 2. IP options & IP fragment >>>> 3. Not a TCP packets >>>> 4. Sanity size check to prevent buffer overflow attack. >>>> >>>> There maybe more cases should be considered such as ip >>>> identification other >>>> flags, while it broke the test because windows set it to the same >>>> even it's >>>> not a fragment. >>>> >>>> Normally it includes 2 typical ways to handle a TCP control flag, >>>> 'bypass' >>>> and 'finalize', 'bypass' means should be sent out directly, and >>>> 'finalize' >>>> means the packets should also be bypassed, and this should be done >>>> after searching for the same connection packets in the pool and send= ing >>>> all of them out, this is to avoid out of data. >>>> >>>> All the 'SYN' packets will be bypassed since this always begin a new= ' >>>> connection, other flags such 'FIN/RST' will trigger a finalization, >>>> because >>>> this normally happens upon a connection is going to be closed, an >>>> 'URG' packet >>>> also finalize current coalescing unit. >>>> >>>> Statistics can be used to monitor the basic coalescing status, the >>>> 'out of order' >>>> and 'out of window' means how many retransmitting packets, thus >>>> describe the >>>> performance intuitively. >>>> >>>> Signed-off-by: Wei Xu >>>> --- >>>> hw/net/virtio-net.c | 480 >>>> ++++++++++++++++++++++++++++++++++++++++- >>>> include/hw/virtio/virtio-net.h | 1 + >>>> include/hw/virtio/virtio.h | 72 +++++++ >>>> 3 files changed, 552 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>> index bd91a4b..81e8e71 100644 >>>> --- a/hw/net/virtio-net.c >>>> +++ b/hw/net/virtio-net.c >>>> @@ -15,10 +15,12 @@ >>>> #include "qemu/iov.h" >>>> #include "hw/virtio/virtio.h" >>>> #include "net/net.h" >>>> +#include "net/eth.h" >>>> #include "net/checksum.h" >>>> #include "net/tap.h" >>>> #include "qemu/error-report.h" >>>> #include "qemu/timer.h" >>>> +#include "qemu/sockets.h" >>>> #include "hw/virtio/virtio-net.h" >>>> #include "net/vhost_net.h" >>>> #include "hw/virtio/virtio-bus.h" >>>> @@ -38,6 +40,24 @@ >>>> #define endof(container, field) \ >>>> (offsetof(container, field) + sizeof(((container *)0)->field)= ) >>>> +#define VIRTIO_NET_IP4_ADDR_SIZE 8 /* ipv4 saddr + dadd= r */ >>>> +#define VIRTIO_NET_TCP_PORT_SIZE 4 /* sport + dport */ >>>> + >>>> +/* IPv4 max payload, 16 bits in the header */ >>>> +#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header= )) >>>> +#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535 >>>> + >>>> +/* header lenght value in ip header without option */ >>> typo here. >> Thanks. >>>> +#define VIRTIO_NET_IP4_HEADER_LENGTH 5 >>>> + >>>> +/* Purge coalesced packets timer interval */ >>>> +#define VIRTIO_NET_RSC_INTERVAL 300000 >>>> + >>>> +/* This value affects the performance a lot, and should be tuned >>>> carefully, >>>> + '300000'(300us) is the recommended value to pass the WHQL test, >>>> '50000' can >>>> + gain 2x netperf throughput with tso/gso/gro 'off'. */ >>>> +static uint32_t virtio_net_rsc_timeout =3D VIRTIO_NET_RSC_INTERVAL; >>> Like we've discussed in previous versions, need we add another proper= ty >>> for this? >> Do you know how to make this a tunable parameter to guest? can this >> parameter be set via control queue? > It's possible I think. > > > [...] > >>>> + >>>> +static void virtio_net_rsc_purge(void *opq) >>>> +{ >>>> + NetRscChain *chain =3D (NetRscChain *)opq; >>>> + NetRscSeg *seg, *rn; >>>> + >>>> + QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) { >>>> + if (virtio_net_rsc_drain_seg(chain, seg) =3D=3D 0) { >>>> + chain->stat.purge_failed++; >>>> + continue; >>> Is it better to break here, consider we fail to do the receive? >> Actually this fails only when receive fails according to the test, but >> should drain all the segments here, >> if break here, then will have to free all the buffers, is it possible >> a successful receiving followed by a failed one? > I'd say it's possible, e.g guest just finish rx buffer refills. > >> if that does happens, i preferred give it a shot other than free it >> directly. > Ok. > > [...] > >>>> + >>>> +static void virtio_net_rsc_cache_buf(NetRscChain *chain, >>>> NetClientState *nc, >>>> + const uint8_t *buf, size_t siz= e) >>>> +{ >>>> + uint16_t hdr_len; >>>> + NetRscSeg *seg; >>>> + >>>> + hdr_len =3D ((VirtIONet *)(chain->n))->guest_hdr_len; >>>> + seg =3D g_malloc(sizeof(NetRscSeg)); >>>> + seg->buf =3D g_malloc(hdr_len + sizeof(struct eth_header)\ >>>> + + sizeof(struct ip6_header) + >>>> VIRTIO_NET_MAX_TCP_PAYLOAD); >>> Why ip6_header here? >> ipv6 packets can carry 65535(2^16) bytes pure data payload, that means >> the header is not included. >> but ipv4 is included, i choose the maximum here. > Let's add a comment here, since the patch's title is to implement ipv4 > coalescing. ok. > [...] > >>>> + >>>> +static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain, >>>> NetRscSeg *seg, >>>> + const uint8_t *buf, size_t size, NetRscUnit >>>> *unit) >>> indentation is wrong here. >> Should it be aligned from parameter begin? > Looks like we'd better do this. ok. > >> i was going to save an extra line parameter. > [...] >