From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46238) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFbkF-0005qb-Mv for qemu-devel@nongnu.org; Wed, 22 Jun 2016 02:34:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bFbkC-0003FE-C6 for qemu-devel@nongnu.org; Wed, 22 Jun 2016 02:34:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34510) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFbkC-0003Ev-3v for qemu-devel@nongnu.org; Wed, 22 Jun 2016 02:34:36 -0400 References: <1465902912-27527-1-git-send-email-zhangchen.fnst@cn.fujitsu.com> <1465902912-27527-4-git-send-email-zhangchen.fnst@cn.fujitsu.com> <57678CDA.7020700@redhat.com> <20160620121402.GC2891@work-vm> <576A020E.8040804@cn.fujitsu.com> From: Jason Wang Message-ID: <576A3174.2010905@redhat.com> Date: Wed, 22 Jun 2016 14:34:28 +0800 MIME-Version: 1.0 In-Reply-To: <576A020E.8040804@cn.fujitsu.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC PATCH 3/3] filter-rewriter: rewrite tcp packet to keep secondary connection List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Zhang Chen , "Dr. David Alan Gilbert" Cc: Yang Hongyang , "eddie . dong" , qemu devel , Li Zhijian , zhanghailiang On 2016=E5=B9=B406=E6=9C=8822=E6=97=A5 11:12, Zhang Chen wrote: > > > On 06/20/2016 08:14 PM, Dr. David Alan Gilbert wrote: >> * Jason Wang (jasowang@redhat.com) wrote: >>> >>> On 2016=E5=B9=B406=E6=9C=8814=E6=97=A5 19:15, Zhang Chen wrote: >>>> We will rewrite tcp packet secondary received and sent. >>> More verbose please. E.g which fields were rewrote and why. > > OK. > >>>> Signed-off-by: Zhang Chen >>>> Signed-off-by: Li Zhijian >>>> Signed-off-by: Wen Congyang >>>> --- >>>> net/filter-rewriter.c | 94=20 >>>> +++++++++++++++++++++++++++++++++++++++++++++++++-- >>>> trace-events | 3 ++ >>>> 2 files changed, 95 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c >>>> index 12f88c5..86a2f53 100644 >>>> --- a/net/filter-rewriter.c >>>> +++ b/net/filter-rewriter.c >>>> @@ -21,6 +21,7 @@ >>>> #include "qemu/main-loop.h" >>>> #include "qemu/iov.h" >>>> #include "net/checksum.h" >>>> +#include "trace.h" >>>> #define FILTER_COLO_REWRITER(obj) \ >>>> OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER) >>>> @@ -64,6 +65,75 @@ static int is_tcp_packet(Packet *pkt) >>>> } >>>> } >>>> +static int handle_primary_tcp_pkt(NetFilterState *nf, >>>> + Connection *conn, >>>> + Packet *pkt) >>>> +{ >>>> + struct tcphdr *tcp_pkt; >>>> + >>>> + tcp_pkt =3D (struct tcphdr *)pkt->transport_layer; >>>> + >>>> + if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) { >>> Why not use tracepoints directly? >> Because trace can't cope with you having to do an allocation/free. >> >>>> + char *sdebug, *ddebug; >>>> + sdebug =3D strdup(inet_ntoa(pkt->ip->ip_src)); >>>> + ddebug =3D strdup(inet_ntoa(pkt->ip->ip_dst)); >>>> + fprintf(stderr, "%s: src/dst: %s/%s p: seq/ack=3D%u/%u" >>>> + " flags=3D%x\n", __func__, sdebug, ddebug, >>>> + ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack), >>>> + tcp_pkt->th_flags); >> However, this should use the trace_ call to write the result even if=20 >> it's >> using trace_event_get_state to switch the whole block on/off. > > I will fix it in next version. > >> >>>> + g_free(sdebug); >>>> + g_free(ddebug); >>>> + } >>>> + >>>> + if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) =3D=3D TH_ACK)) { >>>> + /* save primary colo tcp packet seq */ >>>> + conn->primary_seq =3D ntohl(tcp_pkt->th_ack) - 1; >>> Looks like primary_seq will only be updated during handshake, I=20 >>> wonder how >>> this works. > > OK. > We assume that colo guest is a tcp server. > > Firstly, client start a tcp handshake. the packet's seq=3Dclient_seq, > ack=3D0,flag=3DSYN. COLO primary guest get this pkt and mirror(filter-m= irror) > to secondary guest, secondary get it use filter-redirector. > Then,primary guest response=20 > pkt(seq=3Dprimary_seq,ack=3Dclient_seq+1,flag=3DACK|SYN). > secondary guest response=20 > pkt(seq=3Dsecondary_seq,ack=3Dclient_seq+1,flag=3DACK|SYN). > In here,we use filter-rewriter save the secondary_seq to it's tcp=20 > connection. > Finally handshake,client send=20 > pkt(seq=3Dclient_seq+1,ack=3Dprimary_seq+1,flag=3DACK). > Here,filter-rewriter can get primary_seq, and rewrite ack from=20 > primary_seq+1 > to secondary_seq+1, recalculate checksum. So the secondary tcp connecti= on > kept good. > > When we send/recv packet. > client send=20 > pkt(seq=3Dclient_seq+1+data_len,ack=3Dprimary_seq+1,flag=3DACK|PSH). > filter-rewriter rewrite ack and send to secondary guest. If I read your code correctly, secondary_seq will only be updated during=20 handshake. So the ack seq will always be same for each packet received=20 by secondary? > primary guest response=20 > pkt(seq=3Dprimary_seq+1,ack=3Dclient_seq+1+data_len,flag=3DACK) > secondary guest response=20 > pkt(seq=3Dsecondary_seq+1,ack=3Dclient_seq+1+data_len,flag=3DACK) Is ACK a must here? > we rewrite secondary guest seq from secondary_seq+1 to primary_seq+1. > So tcp connection kept good. What if, consider we have a large window, so server(guest) want to send=20 more than one TCP packets? The code can only advance primary_seq when=20 we've received an ack which seems wrong. So it will be very tricky if you don't track offset. Basically, what I=20 suggest is rather simple: 1) calculate offset during handshake, e.g offset =3D secondary_seq_syn -=20 primary_seq_syn 2) in handle_primary_tcp_pkt: tcp_pkt->th_ack +=3D offset; 3) in handle_secondary_tcp_pkt: tcp_pkt->th_seq -=3D offset; Looks like this can handle more cases and more robust than current code? > > >> This code really needs commenting to make it see what's going on; each >> of these functions should say which way the packet is going (e.g. >> 'handle packets to the primary from the secondary') - there's a lot >> of packet flows going on and without the comments it's very hard to=20 >> follow. > > Thanks..I will add comments in next version. > >> >> I think this could be because we're fixing up the sequence numbers on=20 >> the >> secondary once we've received the first response from the primary, so=20 >> it's >> only the first packet of each connection that the primary has to do=20 >> this on - >> but hmm I'm not sure without some comments. > > Yes,you are right. > > > >> >> Dave >> >>>> + >>>> + /* adjust tcp seq to make secondary guest handle it */ >>>> + tcp_pkt->th_ack =3D htonl(conn->secondary_seq + 1); >>> I'm not sure this can work for all cases. I believe we should also=20 >>> rewrite >>> seq here. And to me, a better approach is to track the offset of seq=20 >>> between >>> pri and sec during handshake and rewrite both ack and seq based on th= is >>> offset. > > In the vast majority of cases, colo guest is a tcp server. > client kernel and guest kernel make the tcp seq work good. > we don't need rewrite seq here. we just need rewrite ack > and checksum can make secondary tcp connection work. If > colo guest is a tcp client,maybe we can wait colo-compare > do a checkpoint(secondary haven't send tcp packet in time). > > > Thanks > Zhang Chen > > >>>> + net_checksum_calculate((uint8_t *)pkt->data, pkt->size); >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int handle_secondary_tcp_pkt(NetFilterState *nf, >>>> + Connection *conn, >>>> + Packet *pkt) >>>> +{ >>>> + struct tcphdr *tcp_pkt; >>>> + >>>> + tcp_pkt =3D (struct tcphdr *)pkt->transport_layer; >>>> + >>>> + if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) { >>>> + char *sdebug, *ddebug; >>>> + sdebug =3D strdup(inet_ntoa(pkt->ip->ip_src)); >>>> + ddebug =3D strdup(inet_ntoa(pkt->ip->ip_dst)); >>>> + printf("handle_secondary_tcp_pkt conn->secondary_seq =3D=20 >>>> %u,\n", >>>> + conn->secondary_seq); >>>> + printf("handle_secondary_tcp_pkt conn->primary_seq =3D %u,\= n", >>>> + conn->primary_seq); >>>> + fprintf(stderr, "%s: src/dst: %s/%s p: seq/ack=3D%u/%u" >>>> + " flags=3D%x\n", __func__, sdebug, ddebug, >>>> + ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack), >>>> + tcp_pkt->th_flags); >>>> + g_free(sdebug); >>>> + g_free(ddebug); >>>> + } >>>> + >>>> + if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) =3D=3D (TH_ACK |=20 >>>> TH_SYN))) { >>>> + /* save client's seq */ >>>> + conn->secondary_seq =3D ntohl(tcp_pkt->th_seq); >>>> + } >>>> + >>>> + if ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) =3D=3D TH_ACK) { >>>> + tcp_pkt->th_seq =3D htonl(conn->primary_seq + 1); >>>> + net_checksum_calculate((uint8_t *)pkt->data, pkt->size); >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> static ssize_t colo_rewriter_receive_iov(NetFilterState *nf, >>>> NetClientState *sender, >>>> unsigned flags, >>>> @@ -106,10 +176,30 @@ static ssize_t=20 >>>> colo_rewriter_receive_iov(NetFilterState *nf, >>>> if (sender =3D=3D nf->netdev) { >>>> /* This packet is sent by netdev itself */ >>>> /* NET_FILTER_DIRECTION_TX */ >>>> - /* handle_primary_tcp_pkt */ >>>> + if (!handle_primary_tcp_pkt(nf, conn, pkt)) { >>>> + qemu_net_queue_send(s->incoming_queue, sender, 0, >>>> + (const uint8_t *)pkt->data, pkt->size, NULL); >>>> + packet_destroy(pkt, NULL); >>>> + pkt =3D NULL; >>>> + /* >>>> + * We block the packet here,after rewrite pkt >>>> + * and will send it >>>> + */ >>>> + return 1; >>>> + } >>>> } else { >>>> /* NET_FILTER_DIRECTION_RX */ >>>> - /* handle_secondary_tcp_pkt */ >>>> + if (!handle_secondary_tcp_pkt(nf, conn, pkt)) { >>>> + qemu_net_queue_send(s->incoming_queue, sender, 0, >>>> + (const uint8_t *)pkt->data, pkt->size, NULL); >>>> + packet_destroy(pkt, NULL); >>>> + pkt =3D NULL; >>>> + /* >>>> + * We block the packet here,after rewrite pkt >>>> + * and will send it >>>> + */ >>>> + return 1; >>>> + } >>>> } >>>> } >>>> diff --git a/trace-events b/trace-events >>>> index 6686cdf..5d798c6 100644 >>>> --- a/trace-events >>>> +++ b/trace-events >>>> @@ -1927,3 +1927,6 @@ colo_compare_icmp_miscompare_mtu(const char=20 >>>> *sta, int size) ": %s %d" >>>> colo_compare_ip_info(int psize, const char *sta, const char=20 >>>> *stb, int ssize, const char *stc, const char *std) "ppkt size =3D %d= ,=20 >>>> ip_src =3D %s, ip_dst =3D %s, spkt size =3D %d, ip_src =3D %s, ip_ds= t =3D %s" >>>> colo_old_packet_check_found(int64_t old_time) "%" PRId64 >>>> colo_compare_miscompare(void) "" >>>> + >>>> +# net/filter-rewriter.c >>>> +colo_filter_rewriter_debug(void) "" >> --=20 >> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >> >> >> . >> >