From: Wei Xu <wexu@redhat.com>
To: Jason Wang <jasowang@redhat.com>, qemu-devel@nongnu.org
Cc: Wei Xu <wei@redhat.com>,
victork@redhat.com, mst@redhat.com, yvugenfi@redhat.com,
marcel@redhat.com, dfleytma@redhat.com
Subject: Re: [Qemu-devel] [RFC Patch v2 04/10] virtio-net rsc: Detailed IPv4 and General TCP data coalescing
Date: Mon, 01 Feb 2016 16:29:19 +0800 [thread overview]
Message-ID: <56AF175F.5060302@redhat.com> (raw)
In-Reply-To: <56AEF951.4010506@redhat.com>
On 02/01/2016 02:21 PM, Jason Wang wrote:
>
> On 02/01/2016 02:13 AM, wexu@redhat.com wrote:
>> From: Wei Xu <wei@wei-thinkpad.nay.redhat.com>
>>
>> Since this feature also needs to support IPv6, and there are
>> some protocol specific differences difference for IPv4/6 in the header,
>> so try to make the interface to be general.
>>
>> IPv4/6 should set up both the new and old IP/TCP header before invoking
>> TCP coalescing, and should also tell the real payload.
>>
>> The main handler of TCP includes TCP window update, duplicated ACK check
>> and the real data coalescing if the new segment passed invalid filter
>> and is identified as an expected one.
>>
>> An expected segment means:
>> 1. Segment is within current window and the sequence is the expected one.
>> 2. ACK of the segment is in the valid window.
>> 3. If the ACK in the segment is a duplicated one, then it must less than 2,
>> this is to notify upper layer TCP starting retransmission due to the spec.
>>
>> Signed-off-by: Wei Xu <wexu@redhat.com>
>> ---
>> hw/net/virtio-net.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 124 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index cfbac6d..4f77fbe 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -41,6 +41,10 @@
>>
>> #define VIRTIO_HEADER 12 /* Virtio net header size */
>> #define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
>> +#define TCP_WINDOW 65535
> The name is confusing, how about TCP_MAX_WINDOW_SIZE ?
Sounds better, will take it in.
>
>> +
>> +/* IPv4 max payload, 16 bits in the header */
>> +#define MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
>>
>> #define MAX_VIRTIO_IP_PAYLOAD (65535 + IP_OFFSET)
>>
>> @@ -1670,13 +1674,130 @@ out:
>> return 0;
>> }
>>
>> +static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain, NetRscSeg *seg,
>> + const uint8_t *buf, struct tcp_header *n_tcp,
>> + struct tcp_header *o_tcp)
>> +{
>> + uint32_t nack, oack;
>> + uint16_t nwin, owin;
>> +
>> + nack = htonl(n_tcp->th_ack);
>> + nwin = htons(n_tcp->th_win);
>> + oack = htonl(o_tcp->th_ack);
>> + owin = htons(o_tcp->th_win);
>> +
>> + if ((nack - oack) >= TCP_WINDOW) {
>> + return RSC_FINAL;
>> + } else if (nack == oack) {
>> + /* duplicated ack or window probe */
>> + if (nwin == owin) {
>> + /* duplicated ack, add dup ack count due to whql test up to 1 */
>> +
>> + if (seg->dup_ack_count == 0) {
>> + seg->dup_ack_count++;
>> + return RSC_COALESCE;
>> + } else {
>> + /* Spec says should send it directly */
>> + return RSC_FINAL;
>> + }
>> + } else {
>> + /* Coalesce window update */
> Need we flush this immediately consider it was a window update?
The flowchart in the spec says this can be coalesced as normal.
https://msdn.microsoft.com/en-us/library/windows/hardware/jj853325%28v=vs.85%29.aspx
>
>> + o_tcp->th_win = n_tcp->th_win;
>> + return RSC_COALESCE;
>> + }
>> + } else {
> What if nack < oack here?
That should happen, the modulo-232 arithmetic check at the begin of this function will keep the ack is in the current window.
>
>> + /* pure ack, update ack */
>> + o_tcp->th_ack = n_tcp->th_ack;
>> + return RSC_COALESCE;
>> + }
>> +}
>> +
>> +static int32_t virtio_net_rsc_coalesce_tcp(NetRscChain *chain, NetRscSeg *seg,
>> + const uint8_t *buf, struct tcp_header *n_tcp, uint16_t n_tcp_len,
>> + uint16_t n_data, struct tcp_header *o_tcp, uint16_t o_tcp_len,
>> + uint16_t o_data, uint16_t *p_ip_len, uint16_t max_data)
>> +{
>> + void *data;
>> + uint16_t o_ip_len;
>> + uint32_t nseq, oseq;
>> +
>> + o_ip_len = htons(*p_ip_len);
>> + nseq = htonl(n_tcp->th_seq);
>> + oseq = htonl(o_tcp->th_seq);
>> +
> Need to the tcp header check here. And looks like we need also check more:
>
> - Flags
> - Data offset
> - URG pointer
ok.
>
>> + /* Ignore packet with more/larger tcp options */
>> + if (n_tcp_len > o_tcp_len) {
> What if n_tcp_len < o_tcp_len ?
This maybe a bug, it's better to bypass it.
>
>> + return RSC_FINAL;
>> + }
>> +
>> + /* out of order or retransmitted. */
>> + if ((nseq - oseq) > TCP_WINDOW) {
>> + return RSC_FINAL;
>> + }
>> +
>> + data = ((uint8_t *)n_tcp) + n_tcp_len;
>> + if (nseq == oseq) {
>> + if ((0 == o_data) && n_data) {
>> + /* From no payload to payload, normal case, not a dup ack or etc */
>> + goto coalesce;
>> + } else {
>> + return virtio_net_rsc_handle_ack(chain, seg, buf, n_tcp, o_tcp);
>> + }
>> + } else if ((nseq - oseq) != o_data) {
>> + /* Not a consistent packet, out of order */
>> + return RSC_FINAL;
>> + } else {
>> +coalesce:
>> + if ((o_ip_len + n_data) > max_data) {
>> + return RSC_FINAL;
>> + }
>> +
>> + /* Here comes the right data, the payload lengh in v4/v6 is different,
>> + so use the field value to update */
>> + *p_ip_len = htons(o_ip_len + n_data); /* Update new data len */
>> + o_tcp->th_offset_flags = n_tcp->th_offset_flags; /* Bring 'PUSH' big */
> Is it correct? How about URG pointer?
It's better to bypass URG packets too.
>
>> + o_tcp->th_ack = n_tcp->th_ack;
>> + o_tcp->th_win = n_tcp->th_win;
>> +
>> + memmove(seg->buf + seg->size, data, n_data);
>> + seg->size += n_data;
>> + return RSC_COALESCE;
>> + }
>> +}
>>
>> static int32_t virtio_net_rsc_try_coalesce4(NetRscChain *chain,
>> NetRscSeg *seg, const uint8_t *buf, size_t size)
>> {
>> - /* This real part of this function will be introduced in next patch, just
>> - * return a 'final' to feed the compilation. */
>> - return RSC_FINAL;
>> + uint16_t o_ip_len, n_ip_len; /* len in ip header field */
>> + uint16_t n_ip_hdrlen, o_ip_hdrlen; /* ipv4 header len */
>> + uint16_t n_tcp_len, o_tcp_len; /* tcp header len */
>> + uint16_t o_data, n_data; /* payload without virtio/eth/ip/tcp */
>> + struct ip_header *n_ip, *o_ip;
>> + struct tcp_header *n_tcp, *o_tcp;
>> +
>> + n_ip = (struct ip_header *)(buf + IP_OFFSET);
>> + n_ip_hdrlen = ((0xF & n_ip->ip_ver_len) << 2);
>> + n_ip_len = htons(n_ip->ip_len);
>> + n_tcp = (struct tcp_header *)(((uint8_t *)n_ip) + n_ip_hdrlen);
>> + n_tcp_len = (htons(n_tcp->th_offset_flags) & 0xF000) >> 10;
>> + n_data = n_ip_len - n_ip_hdrlen - n_tcp_len;
>> +
>> + o_ip = (struct ip_header *)(seg->buf + IP_OFFSET);
>> + o_ip_hdrlen = ((0xF & o_ip->ip_ver_len) << 2);
>> + o_ip_len = htons(o_ip->ip_len);
>> + o_tcp = (struct tcp_header *)(((uint8_t *)o_ip) + o_ip_hdrlen);
>> + o_tcp_len = (htons(o_tcp->th_offset_flags) & 0xF000) >> 10;
>> + o_data = o_ip_len - o_ip_hdrlen - o_tcp_len;
> Any chance to eliminate the above codes duplication into a helper? To
> simplify the codes, you may want just store two pointers in seg (one is
> ip header another is tcp header).
OK, will investigate it.
>
>> +
>> + if ((n_ip->ip_src ^ o_ip->ip_src) || (n_ip->ip_dst ^ o_ip->ip_dst)
>> + || (n_tcp->th_sport ^ o_tcp->th_sport)
>> + || (n_tcp->th_dport ^ o_tcp->th_dport)) {
> Lots of other fields need to be checked here:
>
> - header length
> - TOS
> - Flag
> - identification
>
> I think we could not try to merge the packet with if any of the above
> fields do not match. Since you split the coalescing function into
> different layers, tcp header check should be postponed.
OK.
>
>> + return RSC_NO_MATCH;
>> + }
>> +
>> + return virtio_net_rsc_coalesce_tcp(chain, seg, buf,
>> + n_tcp, n_tcp_len, n_data, o_tcp, o_tcp_len,
>> + o_data, &o_ip->ip_len, MAX_IP4_PAYLOAD);
> Since you've passed seg and buf, do we really need such a huge parameter
> list? Looks like some of the header parsing has already been done in
> virtio_net_rsc_coalesce_tcp().
ok, will investigate it.
>
>> }
>>
>> static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
>
next prev parent reply other threads:[~2016-02-01 8:29 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-31 18:13 [Qemu-devel] [RFC v2 0/10] Support Receive-Segment-Offload(RSC) for WHQL test of Window guest wexu
2016-01-31 18:13 ` [Qemu-devel] [RFC Patch v2 01/10] virtio-net rsc: Data structure, 'Segment', 'Chain' and 'Status' wexu
2016-01-31 18:13 ` [Qemu-devel] [RFC Patch v2 02/10] virtio-net rsc: Initilize & Cleanup wexu
2016-01-31 18:47 ` Michael S. Tsirkin
2016-02-01 3:56 ` Wei Xu
2016-02-01 3:32 ` Jason Wang
2016-02-01 7:46 ` Wei Xu
2016-01-31 18:13 ` [Qemu-devel] [RFC Patch v2 03/10] virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4 wexu
2016-01-31 18:50 ` Michael S. Tsirkin
2016-02-01 3:40 ` Wei Xu
2016-02-01 5:55 ` Jason Wang
2016-02-01 8:02 ` Wei Xu
2016-02-01 9:22 ` Jason Wang
2016-01-31 18:13 ` [Qemu-devel] [RFC Patch v2 04/10] virtio-net rsc: Detailed IPv4 and General TCP data coalescing wexu
2016-02-01 6:21 ` Jason Wang
2016-02-01 8:29 ` Wei Xu [this message]
2016-02-01 9:29 ` Jason Wang
2016-01-31 18:13 ` [Qemu-devel] [RFC Patch v2 05/10] virtio-net rsc: Create timer to drain the packets from the cache pool wexu
2016-02-01 6:28 ` Jason Wang
2016-02-01 8:39 ` Wei Xu
2016-02-01 9:31 ` Jason Wang
2016-02-01 13:31 ` Wei Xu
2016-01-31 18:13 ` [Qemu-devel] [RFC Patch v2 06/10] virtio-net rsc: IPv4 checksum wexu
2016-02-01 6:31 ` Jason Wang
2016-02-01 8:40 ` Wei Xu
2016-01-31 18:13 ` [Qemu-devel] [RFC Patch v2 07/10] virtio-net rsc: Checking TCP flag and drain specific connection packets wexu
2016-02-01 6:44 ` Jason Wang
2016-02-01 8:44 ` Wei Xu
2016-01-31 18:13 ` [Qemu-devel] [RFC Patch v2 08/10] virtio-net rsc: Sanity check & More bypass cases check wexu
2016-02-01 6:58 ` Jason Wang
2016-02-01 8:46 ` Wei Xu
2016-01-31 18:13 ` [Qemu-devel] [RFC Patch v2 09/10] virtio-net rsc: Add IPv6 support wexu
2016-02-01 7:14 ` Jason Wang
2016-02-01 8:49 ` Wei Xu
2016-01-31 18:13 ` [Qemu-devel] [RFC Patch v2 10/10] virtio-net rsc: Add Receive Segment Coalesce statistics wexu
2016-02-01 7:16 ` Jason Wang
2016-02-01 8:50 ` Wei Xu
2016-01-31 19:03 ` [Qemu-devel] [RFC v2 0/10] Support Receive-Segment-Offload(RSC) for WHQL test of Window guest Michael S. Tsirkin
2016-02-01 3:23 ` Jason Wang
2016-02-01 3:42 ` Wei Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56AF175F.5060302@redhat.com \
--to=wexu@redhat.com \
--cc=dfleytma@redhat.com \
--cc=jasowang@redhat.com \
--cc=marcel@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=victork@redhat.com \
--cc=wei@redhat.com \
--cc=yvugenfi@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.