qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>,
	"eddie . dong" <eddie.dong@intel.com>,
	qemu devel <qemu-devel@nongnu.org>,
	Li Zhijian <lizhijian@cn.fujitsu.com>,
	zhanghailiang <zhang.zhanghailiang@huawei.com>
Subject: Re: [Qemu-devel] [RFC PATCH 3/3] filter-rewriter: rewrite tcp packet to keep secondary connection
Date: Wed, 22 Jun 2016 14:34:28 +0800	[thread overview]
Message-ID: <576A3174.2010905@redhat.com> (raw)
In-Reply-To: <576A020E.8040804@cn.fujitsu.com>



On 2016年06月22日 11:12, Zhang Chen wrote:
>
>
> On 06/20/2016 08:14 PM, Dr. David Alan Gilbert wrote:
>> * Jason Wang (jasowang@redhat.com) wrote:
>>>
>>> On 2016年06月14日 19:15, Zhang Chen wrote:
>>>> We will rewrite tcp packet secondary received and sent.
>>> More verbose please. E.g which fields were rewrote and why.
>
> OK.
>
>>>> Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
>>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> ---
>>>>    net/filter-rewriter.c | 94 
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++--
>>>>    trace-events          |  3 ++
>>>>    2 files changed, 95 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
>>>> index 12f88c5..86a2f53 100644
>>>> --- a/net/filter-rewriter.c
>>>> +++ b/net/filter-rewriter.c
>>>> @@ -21,6 +21,7 @@
>>>>    #include "qemu/main-loop.h"
>>>>    #include "qemu/iov.h"
>>>>    #include "net/checksum.h"
>>>> +#include "trace.h"
>>>>    #define FILTER_COLO_REWRITER(obj) \
>>>>        OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
>>>> @@ -64,6 +65,75 @@ static int is_tcp_packet(Packet *pkt)
>>>>        }
>>>>    }
>>>> +static int handle_primary_tcp_pkt(NetFilterState *nf,
>>>> +                                  Connection *conn,
>>>> +                                  Packet *pkt)
>>>> +{
>>>> +    struct tcphdr *tcp_pkt;
>>>> +
>>>> +    tcp_pkt = (struct tcphdr *)pkt->transport_layer;
>>>> +
>>>> +    if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) {
>>> Why not use tracepoints directly?
>> Because trace can't cope with you having to do an allocation/free.
>>
>>>> +        char *sdebug, *ddebug;
>>>> +        sdebug = strdup(inet_ntoa(pkt->ip->ip_src));
>>>> +        ddebug = strdup(inet_ntoa(pkt->ip->ip_dst));
>>>> +        fprintf(stderr, "%s: src/dst: %s/%s p: seq/ack=%u/%u"
>>>> +                "  flags=%x\n", __func__, sdebug, ddebug,
>>>> +                ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack),
>>>> +                tcp_pkt->th_flags);
>> However, this should use the trace_ call to write the result even if 
>> it's
>> using trace_event_get_state to switch the whole block on/off.
>
> I will fix it in next version.
>
>>
>>>> +        g_free(sdebug);
>>>> +        g_free(ddebug);
>>>> +    }
>>>> +
>>>> +    if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) {
>>>> +        /* save primary colo tcp packet seq */
>>>> +        conn->primary_seq = ntohl(tcp_pkt->th_ack) - 1;
>>> Looks like primary_seq will only be updated during handshake, I 
>>> wonder how
>>> this works.
>
> OK.
> We assume that colo guest is a tcp server.
>
> Firstly, client start a tcp handshake. the packet's seq=client_seq,
> ack=0,flag=SYN. COLO primary guest get this pkt and mirror(filter-mirror)
> to secondary guest, secondary get it use filter-redirector.
> Then,primary guest response 
> pkt(seq=primary_seq,ack=client_seq+1,flag=ACK|SYN).
> secondary guest response 
> pkt(seq=secondary_seq,ack=client_seq+1,flag=ACK|SYN).
> In here,we use filter-rewriter save the secondary_seq to it's tcp 
> connection.
> Finally handshake,client send 
> pkt(seq=client_seq+1,ack=primary_seq+1,flag=ACK).
> Here,filter-rewriter can get primary_seq, and rewrite ack from 
> primary_seq+1
> to secondary_seq+1, recalculate checksum. So the secondary tcp connection
> kept good.
>
> When we send/recv packet.
> client send 
> pkt(seq=client_seq+1+data_len,ack=primary_seq+1,flag=ACK|PSH).
> filter-rewriter rewrite ack and send to secondary guest.

If I read your code correctly, secondary_seq will only be updated during 
handshake. So the ack seq will always be same for each packet received 
by secondary?

> primary guest response 
> pkt(seq=primary_seq+1,ack=client_seq+1+data_len,flag=ACK)
> secondary guest response 
> pkt(seq=secondary_seq+1,ack=client_seq+1+data_len,flag=ACK)

Is ACK a must here?

> we rewrite secondary guest seq from secondary_seq+1 to primary_seq+1.
> So tcp connection kept good.

What if, consider we have a large window, so server(guest) want to send 
more than one TCP packets? The code can only advance primary_seq when 
we've received an ack which seems wrong.

So it will be very tricky if you don't track offset. Basically, what I 
suggest is rather simple:

1) calculate offset during handshake, e.g offset = secondary_seq_syn - 
primary_seq_syn
2) in handle_primary_tcp_pkt: tcp_pkt->th_ack += offset;
3) in handle_secondary_tcp_pkt: tcp_pkt->th_seq -= offset;

Looks like this can handle more cases and more robust than current code?

>
>
>> This code really needs commenting to make it see what's going on; each
>> of these functions should say which way the packet is going (e.g.
>>   'handle packets to the primary from the secondary') - there's a lot
>> of packet flows going on and without the comments it's very hard to 
>> follow.
>
> Thanks..I will add comments in next version.
>
>>
>> I think this could be because we're fixing up the sequence numbers on 
>> the
>> secondary once we've received the first response from the primary, so 
>> it's
>> only the first packet of each connection that the primary has to do 
>> this on -
>> but hmm I'm not sure without some comments.
>
> Yes,you are right.
>
>
>
>>
>> Dave
>>
>>>> +
>>>> +        /* adjust tcp seq to make secondary guest handle it */
>>>> +        tcp_pkt->th_ack = htonl(conn->secondary_seq + 1);
>>> I'm not sure this can work for all cases. I believe we should also 
>>> rewrite
>>> seq here. And to me, a better approach is to track the offset of seq 
>>> between
>>> pri and sec during handshake and rewrite both ack and seq based on this
>>> offset.
>
> In the vast majority of cases, colo guest is a tcp server.
> client kernel and guest kernel make the tcp seq work good.
> we don't need rewrite seq here. we just need rewrite ack
> and checksum can make secondary tcp connection work. If
> colo guest is a tcp client,maybe we can wait colo-compare
> do a checkpoint(secondary haven't send tcp packet in time).
>
>
> Thanks
> Zhang Chen
>
>
>>>> + net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int handle_secondary_tcp_pkt(NetFilterState *nf,
>>>> +                                    Connection *conn,
>>>> +                                    Packet *pkt)
>>>> +{
>>>> +    struct tcphdr *tcp_pkt;
>>>> +
>>>> +    tcp_pkt = (struct tcphdr *)pkt->transport_layer;
>>>> +
>>>> +    if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) {
>>>> +        char *sdebug, *ddebug;
>>>> +        sdebug = strdup(inet_ntoa(pkt->ip->ip_src));
>>>> +        ddebug = strdup(inet_ntoa(pkt->ip->ip_dst));
>>>> +        printf("handle_secondary_tcp_pkt conn->secondary_seq = 
>>>> %u,\n",
>>>> +               conn->secondary_seq);
>>>> +        printf("handle_secondary_tcp_pkt conn->primary_seq = %u,\n",
>>>> +               conn->primary_seq);
>>>> +        fprintf(stderr, "%s: src/dst: %s/%s p: seq/ack=%u/%u"
>>>> +                "  flags=%x\n", __func__, sdebug, ddebug,
>>>> +                ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack),
>>>> +                tcp_pkt->th_flags);
>>>> +        g_free(sdebug);
>>>> +        g_free(ddebug);
>>>> +    }
>>>> +
>>>> +    if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | 
>>>> TH_SYN))) {
>>>> +        /* save client's seq */
>>>> +        conn->secondary_seq = ntohl(tcp_pkt->th_seq);
>>>> +    }
>>>> +
>>>> +    if ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK) {
>>>> +        tcp_pkt->th_seq = htonl(conn->primary_seq + 1);
>>>> +        net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>>    static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
>>>>                                             NetClientState *sender,
>>>>                                             unsigned flags,
>>>> @@ -106,10 +176,30 @@ static ssize_t 
>>>> colo_rewriter_receive_iov(NetFilterState *nf,
>>>>            if (sender == nf->netdev) {
>>>>                /* This packet is sent by netdev itself */
>>>>                /* NET_FILTER_DIRECTION_TX */
>>>> -            /* handle_primary_tcp_pkt */
>>>> +            if (!handle_primary_tcp_pkt(nf, conn, pkt)) {
>>>> +                qemu_net_queue_send(s->incoming_queue, sender, 0,
>>>> +                (const uint8_t *)pkt->data, pkt->size, NULL);
>>>> +                packet_destroy(pkt, NULL);
>>>> +                pkt = NULL;
>>>> +                /*
>>>> +                 * We block the packet here,after rewrite pkt
>>>> +                 * and will send it
>>>> +                 */
>>>> +                return 1;
>>>> +            }
>>>>            } else {
>>>>                /* NET_FILTER_DIRECTION_RX */
>>>> -            /* handle_secondary_tcp_pkt */
>>>> +            if (!handle_secondary_tcp_pkt(nf, conn, pkt)) {
>>>> +                qemu_net_queue_send(s->incoming_queue, sender, 0,
>>>> +                (const uint8_t *)pkt->data, pkt->size, NULL);
>>>> +                packet_destroy(pkt, NULL);
>>>> +                pkt = NULL;
>>>> +                /*
>>>> +                 * We block the packet here,after rewrite pkt
>>>> +                 * and will send it
>>>> +                 */
>>>> +                return 1;
>>>> +            }
>>>>            }
>>>>        }
>>>> diff --git a/trace-events b/trace-events
>>>> index 6686cdf..5d798c6 100644
>>>> --- a/trace-events
>>>> +++ b/trace-events
>>>> @@ -1927,3 +1927,6 @@ colo_compare_icmp_miscompare_mtu(const char 
>>>> *sta, int size) ": %s  %d"
>>>>    colo_compare_ip_info(int psize, const char *sta, const char 
>>>> *stb, int ssize, const char *stc, const char *std) "ppkt size = %d, 
>>>> ip_src = %s, ip_dst = %s, spkt size = %d, ip_src = %s, ip_dst = %s"
>>>>    colo_old_packet_check_found(int64_t old_time) "%" PRId64
>>>>    colo_compare_miscompare(void) ""
>>>> +
>>>> +# net/filter-rewriter.c
>>>> +colo_filter_rewriter_debug(void) ""
>> -- 
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>
>>
>> .
>>
>

  reply	other threads:[~2016-06-22  6:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-14 11:15 [Qemu-devel] [RFC PATCH 0/3] filter-rewriter: introduce filter-rewriter Zhang Chen
2016-06-14 11:15 ` [Qemu-devel] [RFC PATCH 1/3] filter-rewriter: introduce filter-rewriter initialization Zhang Chen
2016-06-14 11:15 ` [Qemu-devel] [RFC PATCH 2/3] filter-rewriter: track connection and parse packet Zhang Chen
2016-06-14 11:15 ` [Qemu-devel] [RFC PATCH 3/3] filter-rewriter: rewrite tcp packet to keep secondary connection Zhang Chen
2016-06-20  6:27   ` Jason Wang
2016-06-20 12:14     ` Dr. David Alan Gilbert
2016-06-22  3:12       ` Zhang Chen
2016-06-22  6:34         ` Jason Wang [this message]
2016-06-23 10:48           ` Zhang Chen
2016-06-24  6:08             ` Jason Wang
2016-06-28  6:33               ` Zhang Chen
2016-06-29  1:55                 ` Jason Wang
2016-06-29  6:13                   ` Zhang Chen
2016-06-30 12:17                     ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=576A3174.2010905@redhat.com \
    --to=jasowang@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=hongyang.yang@easystack.cn \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zhang.zhanghailiang@huawei.com \
    --cc=zhangchen.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).