From: Jason Wang <jasowang@redhat.com>
To: wexu@redhat.com, qemu-devel@nongnu.org
Cc: marcel@redhat.com, mst@redhat.com, victork@redhat.com,
dfleytma@redhat.com, yvugenfi@redhat.com
Subject: Re: [Qemu-devel] [ RFC Patch v6 1/3] virtio-net rsc: support coalescing ipv4 tcp traffic
Date: Mon, 30 May 2016 12:20:14 +0800 [thread overview]
Message-ID: <574BBF7E.4050609@redhat.com> (raw)
In-Reply-To: <1464453454-5703-2-git-send-email-wexu@redhat.com>
On 2016年05月29日 00:37, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> All the data packets in a tcp connection will be cached to a big buffer
> in every receive interval, and will be sent out via a timer, the
> 'virtio_net_rsc_timeout' controls the interval, the value will influent the
> performance and response of tcp connection extremely, 50000(50us) is a
> experience value to gain a performance improvement, since the whql test
> sends packets every 100us, so '300000(300us)' can pass the test case,
> this is also the default value, it's tunable via the command line
> parameter 'rsc_interval' with 'virtio-net-pci' device, for example, below
> parameter is to launch a guest with interval set as '500000'.
Does the value make sense if it was smaller than 1us? If not, why not
just make the unit to be 1us?
>
> 'virtio-net-pci,netdev=hostnet1,bus=pci.0,id=net1,mac=00,rsc_interval=500000'
> will
>
> The timer will only be triggered if the packets pool is not empty,
> and it'll drain off all the cached packets.
>
> 'NetRscChain' is used to save the segments of different protocols in a
> VirtIONet device.
>
> The main handler of TCP includes TCP window update, duplicated ACK check
> and the real data coalescing if the new segment passed sanity check
> and is identified as an 'wanted' one.
>
> An 'wanted' segment means:
> 1. Segment is within current window and the sequence is the expected one.
> 2. 'ACK' of the segment is in the valid window.
>
> Sanity check includes:
> 1. Incorrect version in IP header
> 2. IP options & IP fragment
> 3. Not a TCP packets
> 4. Sanity size check to prevent buffer overflow attack.
>
> There maybe more cases should be considered such as ip identification other
> flags, while it broke the test because windows set it to the same even it's
> not a fragment.
>
> Normally it includes 2 typical ways to handle a TCP control flag, 'bypass'
> and 'finalize', 'bypass' means should be sent out directly, and 'finalize'
> means the packets should also be bypassed, and this should be done
> after searching for the same connection packets in the pool and sending
> all of them out, this is to avoid out of data.
>
> All the 'SYN' packets will be bypassed since this always begin a new'
> connection, other flags such 'FIN/RST' will trigger a finalization, because
> this normally happens upon a connection is going to be closed, an 'URG' packet
> also finalize current coalescing unit.
>
> Statistics can be used to monitor the basic coalescing status, the 'out of order'
> and 'out of window' means how many retransmitting packets, thus describe the
> performance intuitively.
We usually don't add device specific monitor command. Maybe a new
control vq command ethtool -S in guest in the future. I was thinking of
removing those counters since it was never used in this series.
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
> hw/net/virtio-net.c | 498 +++++++++++++++++++++++++++-
> include/hw/virtio/virtio-net.h | 2 +
> include/hw/virtio/virtio.h | 75 +++++
> include/standard-headers/linux/virtio_net.h | 1 +
For RFC, it's ok. But for formal patch, this is not the correct way to
modify Linux headers. There's a script in
scripts/update-linux-headers.sh which was used to sync it from Linux
source. This means, it must be merged in Linux or at least in
maintainer's tree first.
> 4 files changed, 575 insertions(+), 1 deletion(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 5798f87..b3bb63b 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -15,10 +15,12 @@
> #include "qemu/iov.h"
> #include "hw/virtio/virtio.h"
> #include "net/net.h"
> +#include "net/eth.h"
> #include "net/checksum.h"
> #include "net/tap.h"
> #include "qemu/error-report.h"
> #include "qemu/timer.h"
> +#include "qemu/sockets.h"
> #include "hw/virtio/virtio-net.h"
> #include "net/vhost_net.h"
> #include "hw/virtio/virtio-bus.h"
> @@ -38,6 +40,25 @@
> #define endof(container, field) \
> (offsetof(container, field) + sizeof(((container *)0)->field))
>
> +#define VIRTIO_NET_IP4_ADDR_SIZE 8 /* ipv4 saddr + daddr */
> +#define VIRTIO_NET_TCP_PORT_SIZE 4 /* sport + dport */
> +
> +#define VIRTIO_NET_TCP_FLAG 0x3F
> +#define VIRTIO_NET_TCP_HDR_LENGTH 0xF000
> +
> +/* IPv4 max payload, 16 bits in the header */
> +#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
> +#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
Is this still true if window scaling is enabled? And need to make sure
your ack/seq processing can work for window scaling.
> +
> +/* header length value in ip header without option */
> +#define VIRTIO_NET_IP4_HEADER_LENGTH 5
> +
> +/* Purge coalesced packets timer interval, This value affects the performance
> + a lot, and should be tuned carefully, '300000'(300us) is the recommended
> + value to pass the WHQL test, '50000' can gain 2x netperf throughput with
> + tso/gso/gro 'off'. */
> +#define VIRTIO_NET_RSC_INTERVAL 300000
Do we need a new property for this value?
> +
> typedef struct VirtIOFeature {
> uint32_t flags;
> size_t end;
> @@ -1089,7 +1110,8 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
> return 0;
> }
>
> -static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t virtio_net_do_receive(NetClientState *nc,
> + const uint8_t *buf, size_t size)
> {
> VirtIONet *n = qemu_get_nic_opaque(nc);
> VirtIONetQueue *q = virtio_net_get_subqueue(nc);
> @@ -1685,6 +1707,474 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> return 0;
> }
>
> +static void virtio_net_rsc_extract_unit4(NetRscChain *chain,
> + const uint8_t *buf, NetRscUnit* unit)
> +{
> + uint16_t hdr_len;
> + uint16_t ip_hdrlen;
> + struct ip_header *ip;
> +
> + hdr_len = chain->n->guest_hdr_len;
> + ip = (struct ip_header *)(buf + hdr_len + sizeof(struct eth_header));
> + unit->ip = (void *)ip;
> + ip_hdrlen = (ip->ip_ver_len & 0xF) << 2;
> + unit->ip_plen = &ip->ip_len;
> + unit->tcp = (struct tcp_header *)(((uint8_t *)unit->ip) + ip_hdrlen);
> + unit->tcp_hdrlen = (htons(unit->tcp->th_offset_flags) & 0xF000) >> 10;
> + unit->payload = htons(*unit->ip_plen) - ip_hdrlen - unit->tcp_hdrlen;
> +}
> +
> +static void virtio_net_rsc_ipv4_checksum(struct virtio_net_hdr *vhdr,
> + struct ip_header *ip)
> +{
> + uint32_t sum;
> +
> + ip->ip_sum = 0;
> + sum = net_checksum_add_cont(sizeof(struct ip_header), (uint8_t *)ip, 0);
> + ip->ip_sum = cpu_to_be16(net_checksum_finish(sum));
> + vhdr->flags &= ~VIRTIO_NET_HDR_F_NEEDS_CSUM;
> + vhdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID;
> +}
> +
> +static size_t virtio_net_rsc_drain_seg(NetRscChain *chain, NetRscSeg *seg)
> +{
> + int ret;
> + struct virtio_net_hdr *h;
> +
> + h = (struct virtio_net_hdr *)seg->buf;
> + virtio_net_rsc_ipv4_checksum(h, seg->unit.ip);
> + ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
> + QTAILQ_REMOVE(&chain->buffers, seg, next);
> + g_free(seg->buf);
> + g_free(seg);
> +
> + return ret;
> +}
> +
> +static void virtio_net_rsc_purge(void *opq)
> +{
> + NetRscSeg *seg, *rn;
> + NetRscChain *chain = (NetRscChain *)opq;
> +
> + QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) {
> + if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
> + chain->stat.purge_failed++;
> + continue;
Why need this?
> + }
> + }
> +
> + chain->stat.timer++;
> + if (!QTAILQ_EMPTY(&chain->buffers)) {
> + timer_mod(chain->drain_timer,
> + qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + chain->n->rsc_timeout);
Consider the timer is invisible to guest, why use QEMU_CLOCK_VIRTUAL?
> + }
> +}
> +
> +static void virtio_net_rsc_cleanup(VirtIONet *n)
> +{
> + NetRscChain *chain, *rn_chain;
> + NetRscSeg *seg, *rn_seg;
> +
> + QTAILQ_FOREACH_SAFE(chain, &n->rsc_chains, next, rn_chain) {
> + QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn_seg) {
> + QTAILQ_REMOVE(&chain->buffers, seg, next);
> + g_free(seg->buf);
> + g_free(seg);
> + }
> +
> + timer_del(chain->drain_timer);
> + timer_free(chain->drain_timer);
> + QTAILQ_REMOVE(&n->rsc_chains, chain, next);
> + g_free(chain);
> + }
> +}
> +
> +static void virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
> + const uint8_t *buf, size_t size)
> +{
> + uint16_t hdr_len;
> + NetRscSeg *seg;
> +
> + hdr_len = chain->n->guest_hdr_len;
> + seg = g_malloc(sizeof(NetRscSeg));
> + seg->buf = g_malloc(hdr_len + sizeof(struct eth_header)\
> + + VIRTIO_NET_MAX_TCP_PAYLOAD);
> + memcpy(seg->buf, buf, size);
> + seg->size = size;
> + seg->packets = 1;
> + seg->dup_ack = 0;
> + seg->is_coalesced = 0;
> + seg->nc = nc;
> +
> + QTAILQ_INSERT_TAIL(&chain->buffers, seg, next);
> + chain->stat.cache++;
> +
> + virtio_net_rsc_extract_unit4(chain, seg->buf, &seg->unit);
> +}
> +
> +static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain,
> + NetRscSeg *seg, const uint8_t *buf,
> + struct tcp_header *n_tcp,
> + struct tcp_header *o_tcp)
> +{
> + uint32_t nack, oack;
> + uint16_t nwin, owin;
> +
> + nack = htonl(n_tcp->th_ack);
> + nwin = htons(n_tcp->th_win);
> + oack = htonl(o_tcp->th_ack);
> + owin = htons(o_tcp->th_win);
> +
> + if ((nack - oack) >= VIRTIO_NET_MAX_TCP_PAYLOAD) {
> + chain->stat.ack_out_of_win++;
> + return RSC_FINAL;
> + } else if (nack == oack) {
> + /* duplicated ack or window probe */
> + if (nwin == owin) {
> + /* duplicated ack, add dup ack count due to whql test up to 1 */
> + chain->stat.dup_ack++;
> + return RSC_FINAL;
> + } else {
> + /* Coalesce window update */
> + o_tcp->th_win = n_tcp->th_win;
> + chain->stat.win_update++;
> + return RSC_COALESCE;
> + }
> + } else {
> + /* pure ack, go to 'C', finalize*/
> + chain->stat.pure_ack++;
> + return RSC_FINAL;
> + }
> +}
> +
> +static int32_t virtio_net_rsc_coalesce_data(NetRscChain *chain,
> + NetRscSeg *seg, const uint8_t *buf,
> + NetRscUnit *n_unit)
> +{
> + void *data;
> + uint16_t o_ip_len;
> + uint32_t nseq, oseq;
> + NetRscUnit *o_unit;
> +
> + o_unit = &seg->unit;
> + o_ip_len = htons(*o_unit->ip_plen);
> + nseq = htonl(n_unit->tcp->th_seq);
> + oseq = htonl(o_unit->tcp->th_seq);
> +
> + /* out of order or retransmitted. */
> + if ((nseq - oseq) > VIRTIO_NET_MAX_TCP_PAYLOAD) {
> + chain->stat.data_out_of_win++;
> + return RSC_FINAL;
> + }
> +
> + data = ((uint8_t *)n_unit->tcp) + n_unit->tcp_hdrlen;
> + if (nseq == oseq) {
> + if ((o_unit->payload == 0) && n_unit->payload) {
> + /* From no payload to payload, normal case, not a dup ack or etc */
> + chain->stat.data_after_pure_ack++;
> + goto coalesce;
> + } else {
> + return virtio_net_rsc_handle_ack(chain, seg, buf,
> + n_unit->tcp, o_unit->tcp);
> + }
> + } else if ((nseq - oseq) != o_unit->payload) {
> + /* Not a consistent packet, out of order */
> + chain->stat.data_out_of_order++;
> + return RSC_FINAL;
> + } else {
> +coalesce:
> + if ((o_ip_len + n_unit->payload) > chain->max_payload) {
> + chain->stat.over_size++;
> + return RSC_FINAL;
> + }
> +
> + /* Here comes the right data, the payload length in v4/v6 is different,
> + so use the field value to update and record the new data len */
> + o_unit->payload += n_unit->payload; /* update new data len */
> +
> + /* update field in ip header */
> + *o_unit->ip_plen = htons(o_ip_len + n_unit->payload);
> +
> + /* Bring 'PUSH' big, the whql test guide says 'PUSH' can be coalesced
> + for windows guest, while this may change the behavior for linux
> + guest. */
> + o_unit->tcp->th_offset_flags = n_unit->tcp->th_offset_flags;
> +
> + o_unit->tcp->th_ack = n_unit->tcp->th_ack;
> + o_unit->tcp->th_win = n_unit->tcp->th_win;
> +
> + memmove(seg->buf + seg->size, data, n_unit->payload);
> + seg->size += n_unit->payload;
> + seg->packets++;
> + chain->stat.coalesced++;
> + return RSC_COALESCE;
> + }
> +}
> +
> +static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain, NetRscSeg *seg,
> + const uint8_t *buf, size_t size,
> + NetRscUnit *unit)
> +{
> + struct ip_header *ip1, *ip2;
> +
> + ip1 = (struct ip_header *)(unit->ip);
> + ip2 = (struct ip_header *)(seg->unit.ip);
> + if ((ip1->ip_src ^ ip2->ip_src) || (ip1->ip_dst ^ ip2->ip_dst)
> + || (unit->tcp->th_sport ^ seg->unit.tcp->th_sport)
> + || (unit->tcp->th_dport ^ seg->unit.tcp->th_dport)) {
> + chain->stat.no_match++;
> + return RSC_NO_MATCH;
> + }
> +
> + return virtio_net_rsc_coalesce_data(chain, seg, buf, unit);
> +}
> +
> +/* Pakcets with 'SYN' should bypass, other flag should be sent after drain
> + * to prevent out of order */
> +static int virtio_net_rsc_tcp_ctrl_check(NetRscChain *chain,
> + struct tcp_header *tcp)
> +{
> + uint16_t tcp_hdr;
> + uint16_t tcp_flag;
> +
> + tcp_flag = htons(tcp->th_offset_flags);
> + tcp_hdr = (tcp_flag & VIRTIO_NET_TCP_HDR_LENGTH) >> 10;
> + tcp_flag &= VIRTIO_NET_TCP_FLAG;
> + tcp_flag = htons(tcp->th_offset_flags) & 0x3F;
> + if (tcp_flag & TH_SYN) {
> + chain->stat.tcp_syn++;
> + return RSC_BYPASS;
> + }
> +
> + if (tcp_flag & (TH_FIN | TH_URG | TH_RST)) {
> + chain->stat.tcp_ctrl_drain++;
> + return RSC_FINAL;
> + }
> +
> + if (tcp_hdr > sizeof(struct tcp_header)) {
> + chain->stat.tcp_all_opt++;
> + return RSC_FINAL;
> + }
> +
> + return RSC_WANT;
> +}
> +
> +static bool virtio_net_rsc_empty_cache(NetRscChain *chain, NetClientState *nc,
> + const uint8_t *buf, size_t size)
> +{
The name is really confusing, it in fact cache the buf if the cache is
empty. So I guess what you mean is "virtio_net_rsc_cache_if_empty()?".
But the question is why need a specific function like this, can we
simply add the logic to virtio_net_rsc_do_coalesce()? (And looks like
most of the logic has been there, except for timer and counter.
> + if (QTAILQ_EMPTY(&chain->buffers)) {
> + chain->stat.empty_cache++;
> + virtio_net_rsc_cache_buf(chain, nc, buf, size);
> + timer_mod(chain->drain_timer,
> + qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + chain->n->rsc_timeout);
> + return 1;
> + }
> +
> + return 0;
> +}
> +
> +static size_t virtio_net_rsc_do_coalesce(NetRscChain *chain, NetClientState *nc,
> + const uint8_t *buf, size_t size,
> + NetRscUnit *unit)
> +{
> + int ret;
> + NetRscSeg *seg, *nseg;
> +
> + QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
> + ret = virtio_net_rsc_coalesce4(chain, seg, buf, size, unit);
> +
> + if (ret == RSC_FINAL) {
> + if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
> + /* Send failed */
> + chain->stat.final_failed++;
> + return 0;
> + }
> +
> + /* Send current packet */
> + return virtio_net_do_receive(nc, buf, size);
> + } else if (ret == RSC_NO_MATCH) {
> + continue;
> + } else {
> + /* Coalesced, mark coalesced flag to tell calc cksum for ipv4 */
> + seg->is_coalesced = 1;
> + return size;
> + }
> + }
> +
> + chain->stat.no_match_cache++;
> + virtio_net_rsc_cache_buf(chain, nc, buf, size);
> + return size;
> +}
> +
> +/* Drain a connection data, this is to avoid out of order segments */
> +static size_t virtio_net_rsc_drain_flow(NetRscChain *chain, NetClientState *nc,
> + const uint8_t *buf, size_t size,
> + uint16_t ip_start, uint16_t ip_size,
> + uint16_t tcp_port, uint16_t port_size)
> +{
> + NetRscSeg *seg, *nseg;
> +
> + QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
> + if (memcmp(buf + ip_start, seg->buf + ip_start, ip_size)
> + || memcmp(buf + tcp_port, seg->buf + tcp_port, port_size)) {
The header comparison is rather similar to what you've done in
virtio_net_rsc_coalesce4/6(). Any chance to reduce the duplication?
> + continue;
> + }
> + if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
> + chain->stat.drain_failed++;
> + }
> +
> + break;
> + }
> +
> + return virtio_net_do_receive(nc, buf, size);
> +}
> +
> +static int32_t virtio_net_rsc_sanity_check4(NetRscChain *chain,
> + struct ip_header *ip,
> + const uint8_t *buf, size_t size)
> +{
> + uint16_t ip_len;
> + uint16_t hdr_len;
> +
> + hdr_len = chain->n->guest_hdr_len;
> + if (size < (hdr_len + sizeof(struct eth_header) + sizeof(struct ip_header)
> + + sizeof(struct tcp_header))) {
It's interesting that you don't have a stat counter for this case.
> + return RSC_BYPASS;
> + }
> +
> + /* Not an ipv4 one */
> + if (((ip->ip_ver_len & 0xF0) >> 4) != IP_HEADER_VERSION_4) {
> + chain->stat.ip_option++;
> + return RSC_BYPASS;
> + }
> +
> + /* Don't handle packets with ip option */
> + if (VIRTIO_NET_IP4_HEADER_LENGTH != (ip->ip_ver_len & 0xF)) {
(ip-> ...) != VIRTIO_NET_IP4_HEADER_LENGTH
> + chain->stat.ip_option++;
> + return RSC_BYPASS;
> + }
> +
> + if (ip->ip_p != IPPROTO_TCP) {
> + chain->stat.bypass_not_tcp++;
> + return RSC_BYPASS;
> + }
> +
> + /* Don't handle packets with ip fragment */
> + if (!(htons(ip->ip_off) & IP_DF)) {
> + chain->stat.ip_frag++;
> + return RSC_BYPASS;
> + }
> +
> + ip_len = htons(ip->ip_len);
> + if (ip_len < (sizeof(struct ip_header) + sizeof(struct tcp_header))
> + || ip_len > (size - hdr_len - sizeof(struct eth_header))) {
> + chain->stat.ip_hacked++;
> + return RSC_BYPASS;
> + }
> +
> + return RSC_WANT;
> +}
> +
> +static size_t virtio_net_rsc_receive4(NetRscChain *chain, NetClientState* nc,
> + const uint8_t *buf, size_t size)
> +{
> + int32_t ret;
> + uint16_t hdr_len;
> + NetRscUnit unit;
> +
> + hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
> + virtio_net_rsc_extract_unit4(chain, buf, &unit);
Is this safe for you do header analysis before doing sanity check?
> + if (RSC_WANT != virtio_net_rsc_sanity_check4(chain, unit.ip, buf, size)) {
virtio_net_rsc_sanity_check4() = RSC_WANT please. I've pointed out this
kind of style issues several times.
> + return virtio_net_do_receive(nc, buf, size);
> + }
> +
> + ret = virtio_net_rsc_tcp_ctrl_check(chain, unit.tcp);
> + if (ret == RSC_BYPASS) {
> + return virtio_net_do_receive(nc, buf, size);
> + } else if (ret == RSC_FINAL) {
> + return virtio_net_rsc_drain_flow(chain, nc, buf, size,
> + ((hdr_len + sizeof(struct eth_header)) + 12),
> + VIRTIO_NET_IP4_ADDR_SIZE,
> + hdr_len + sizeof(struct eth_header) + sizeof(struct ip_header),
> + VIRTIO_NET_TCP_PORT_SIZE);
> + }
> +
> + if (virtio_net_rsc_empty_cache(chain, nc, buf, size)) {
> + return size;
> + }
> +
> + return virtio_net_rsc_do_coalesce(chain, nc, buf, size, &unit);
> +}
> +
> +static NetRscChain *virtio_net_rsc_lookup_chain(VirtIONet * n,
> + NetClientState *nc,
> + uint16_t proto)
> +{
> + NetRscChain *chain;
> +
> + if (proto != (uint16_t)ETH_P_IP) {
> + return NULL;
> + }
> +
> + QTAILQ_FOREACH(chain, &n->rsc_chains, next) {
> + if (chain->proto == proto) {
> + return chain;
> + }
> + }
> +
> + chain = g_malloc(sizeof(*chain));
> + chain->n = n;
> + chain->proto = proto;
> + chain->max_payload = VIRTIO_NET_MAX_IP4_PAYLOAD;
> + chain->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> + chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
> + virtio_net_rsc_purge, chain);
> + memset(&chain->stat, 0, sizeof(chain->stat));
> +
> + QTAILQ_INIT(&chain->buffers);
> + QTAILQ_INSERT_TAIL(&n->rsc_chains, chain, next);
> +
> + return chain;
> +}
> +
> +static ssize_t virtio_net_rsc_receive(NetClientState *nc,
> + const uint8_t *buf, size_t size)
> +{
> + uint16_t proto;
> + NetRscChain *chain;
> + struct eth_header *eth;
> + VirtIONet *n;
> +
> + n = qemu_get_nic_opaque(nc);
> + if (size < (n->guest_hdr_len + sizeof(struct eth_header))) {
Here should use host_hdr_len, no?
> + return virtio_net_do_receive(nc, buf, size);
> + }
> +
> + eth = (struct eth_header *)(buf + n->guest_hdr_len);
> + proto = htons(eth->h_proto);
> +
> + chain = virtio_net_rsc_lookup_chain(n, nc, proto);
> + if (!chain) {
> + return virtio_net_do_receive(nc, buf, size);
> + } else {
> + chain->stat.received++;
> + return virtio_net_rsc_receive4(chain, nc, buf, size);
> + }
> +}
> +
> +static ssize_t virtio_net_receive(NetClientState *nc,
> + const uint8_t *buf, size_t size)
> +{
> + VirtIONet *n;
> +
> + n = qemu_get_nic_opaque(nc);
> + if (n->host_features & (1ULL << VIRTIO_NET_F_GUEST_RSC)) {
> + return virtio_net_rsc_receive(nc, buf, size);
> + } else {
> + return virtio_net_do_receive(nc, buf, size);
> + }
> +}
> +
> static NetClientInfo net_virtio_info = {
> .type = NET_CLIENT_OPTIONS_KIND_NIC,
> .size = sizeof(NICState),
> @@ -1814,6 +2304,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> nc = qemu_get_queue(n->nic);
> nc->rxfilter_notify_enabled = 1;
>
> + QTAILQ_INIT(&n->rsc_chains);
> n->qdev = dev;
> register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
> virtio_net_save, virtio_net_load, n);
> @@ -1848,6 +2339,7 @@ static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
> g_free(n->vqs);
> qemu_del_nic(n->nic);
> virtio_cleanup(vdev);
> + virtio_net_rsc_cleanup(n);
> }
>
> static void virtio_net_instance_init(Object *obj)
> @@ -1909,6 +2401,10 @@ static Property virtio_net_properties[] = {
> TX_TIMER_INTERVAL),
> DEFINE_PROP_INT32("x-txburst", VirtIONet, net_conf.txburst, TX_BURST),
> DEFINE_PROP_STRING("tx", VirtIONet, net_conf.tx),
> + DEFINE_PROP_BIT("guest_rsc", VirtIONet, host_features,
> + VIRTIO_NET_F_GUEST_RSC, false),
> + DEFINE_PROP_UINT32("rsc_interval", VirtIONet, rsc_timeout,
> + VIRTIO_NET_RSC_INTERVAL),
> DEFINE_PROP_END_OF_LIST(),
> };
>
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 0cabdb6..e995190 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -59,6 +59,8 @@ typedef struct VirtIONet {
> VirtIONetQueue *vqs;
> VirtQueue *ctrl_vq;
> NICState *nic;
> + QTAILQ_HEAD(, NetRscChain) rsc_chains;
> + uint32_t rsc_timeout;
> uint32_t tx_timeout;
> int32_t tx_burst;
> uint32_t has_vnet_hdr;
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 6a37065..a7cb84a 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -30,6 +30,8 @@
> (0x1ULL << VIRTIO_F_ANY_LAYOUT))
>
> struct VirtQueue;
> +struct VirtIONet;
> +typedef struct VirtIONet VirtIONet;
>
> static inline hwaddr vring_align(hwaddr addr,
> unsigned long align)
> @@ -128,6 +130,79 @@ typedef struct VirtioDeviceClass {
> int (*load)(VirtIODevice *vdev, QEMUFile *f, int version_id);
> } VirtioDeviceClass;
>
> +/* Coalesced packets type & status */
> +typedef enum {
> + RSC_COALESCE, /* Data been coalesced */
> + RSC_FINAL, /* Will terminate current connection */
> + RSC_NO_MATCH, /* No matched in the buffer pool */
> + RSC_BYPASS, /* Packet to be bypass, not tcp, tcp ctrl, etc */
> + RSC_WANT /* Data want to be coalesced */
> +} COALESCE_STATUS;
> +
> +typedef struct NetRscStat {
> + uint32_t received;
> + uint32_t coalesced;
> + uint32_t over_size;
> + uint32_t cache;
> + uint32_t empty_cache;
> + uint32_t no_match_cache;
> + uint32_t win_update;
> + uint32_t no_match;
> + uint32_t tcp_syn;
> + uint32_t tcp_ctrl_drain;
> + uint32_t dup_ack;
> + uint32_t dup_ack1;
> + uint32_t dup_ack2;
> + uint32_t pure_ack;
> + uint32_t ack_out_of_win;
> + uint32_t data_out_of_win;
> + uint32_t data_out_of_order;
> + uint32_t data_after_pure_ack;
> + uint32_t bypass_not_tcp;
> + uint32_t tcp_option;
> + uint32_t tcp_all_opt;
> + uint32_t ip_frag;
> + uint32_t ip_hacked;
> + uint32_t ip_option;
> + uint32_t purge_failed;
> + uint32_t drain_failed;
> + uint32_t final_failed;
> + int64_t timer;
> +} NetRscStat;
> +
> +/* Rsc unit general info used to checking if can coalescing */
> +typedef struct NetRscUnit {
> + void *ip; /* ip header */
> + uint16_t *ip_plen; /* data len pointer in ip header field */
> + struct tcp_header *tcp; /* tcp header */
> + uint16_t tcp_hdrlen; /* tcp header len */
> + uint16_t payload; /* pure payload without virtio/eth/ip/tcp */
> +} NetRscUnit;
> +
> +/* Coalesced segmant */
> +typedef struct NetRscSeg {
> + QTAILQ_ENTRY(NetRscSeg) next;
> + void *buf;
> + size_t size;
> + uint16_t packets;
> + uint16_t dup_ack;
> + bool is_coalesced; /* need recal ipv4 header checksum, mark here */
> + NetRscUnit unit;
> + NetClientState *nc;
> +} NetRscSeg;
> +
> +/* Chain is divided by protocol(ipv4/v6) and NetClientInfo */
> +typedef struct NetRscChain {
> + QTAILQ_ENTRY(NetRscChain) next;
> + VirtIONet *n; /* VirtIONet */
> + uint16_t proto;
> + uint8_t gso_type;
> + uint16_t max_payload;
> + QEMUTimer *drain_timer;
> + QTAILQ_HEAD(, NetRscSeg) buffers;
> + NetRscStat stat;
> +} NetRscChain;
> +
> void virtio_instance_init_common(Object *proxy_obj, void *data,
> size_t vdev_size, const char *vdev_name);
>
> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
> index a78f33e..5b95762 100644
> --- a/include/standard-headers/linux/virtio_net.h
> +++ b/include/standard-headers/linux/virtio_net.h
> @@ -55,6 +55,7 @@
> #define VIRTIO_NET_F_MQ 22 /* Device supports Receive Flow
> * Steering */
> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> +#define VIRTIO_NET_F_GUEST_RSC 24 /* Guest can coalesce tcp packets */
The comment should be "Guest can receive coalesced tcp packets" ?
>
> #ifndef VIRTIO_NET_NO_LEGACY
> #define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */
next prev parent reply other threads:[~2016-05-30 4:20 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-28 16:37 [Qemu-devel] [ RFC Patch v6 0/2] Support Receive-Segment-Offload(RSC) for WHQL wexu
2016-05-28 16:37 ` [Qemu-devel] [ RFC Patch v6 1/3] virtio-net rsc: support coalescing ipv4 tcp traffic wexu
2016-05-30 4:20 ` Jason Wang [this message]
2016-06-02 16:16 ` Wei Xu
2016-05-28 16:37 ` [Qemu-devel] [ RFC Patch v6 2/3] virtio-net rsc: support coalescing ipv6 " wexu
2016-05-30 4:25 ` Jason Wang
2016-06-02 16:17 ` Wei Xu
2016-05-28 16:37 ` [Qemu-devel] [ RFC Patch v6 3/3] virtio-net rsc: add 2 new rsc information fields to 'virtio_net_hdr' wexu
2016-05-30 5:57 ` Jason Wang
2016-06-02 16:23 ` Wei Xu
2016-05-30 4:22 ` [Qemu-devel] [ RFC Patch v6 0/2] Support Receive-Segment-Offload(RSC) for WHQL Jason Wang
2016-05-30 4:50 ` Wei Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=574BBF7E.4050609@redhat.com \
--to=jasowang@redhat.com \
--cc=dfleytma@redhat.com \
--cc=marcel@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=victork@redhat.com \
--cc=wexu@redhat.com \
--cc=yvugenfi@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).