qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
To: Jason Wang <jasowang@redhat.com>,
	zhangchen.fnst@cn.fujitsu.com, lizhijian@cn.fujitsu.com
Cc: xuquan8@huawei.com, qemu-devel@nongnu.org, pss.wulizhen@huawei.com
Subject: Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table
Date: Mon, 27 Feb 2017 12:09:44 +0800	[thread overview]
Message-ID: <58B3A688.5040707@huawei.com> (raw)
In-Reply-To: <c7883f1d-99ed-d082-3eee-7e0c641257e6@redhat.com>

On 2017/2/27 11:40, Jason Wang wrote:
>
>
> On 2017年02月27日 11:11, Hailiang Zhang wrote:
>> On 2017/2/23 12:16, Jason Wang wrote:
>>>
>>>
>>> On 2017年02月22日 16:51, Hailiang Zhang wrote:
>>>> On 2017/2/22 16:45, Hailiang Zhang wrote:
>>>>> On 2017/2/22 16:07, Jason Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 2017年02月22日 11:46, zhanghailiang wrote:
>>>>>>> After a net connection is closed, we didn't clear its releated
>>>>>>> resources
>>>>>>> in connection_track_table, which will lead to memory leak.
>>>>>>
>>>>>> Not a real leak but would lead reset of hash table if too many closed
>>>>>> connections.
>>>>>>
>>>>>
>>>>> Yes, you are right, there will be lots of stale connection data in
>>>>> hash table
>>>>> if we don't remove it while it is been closed. Which
>>>
>>>
>>> Ok, so let's come up with a better title of the patch.
>>>
>>
>> OK.
>>
>>>>>
>>>>>>>
>>>>>>> Let't track the state of net connection, if it is closed, its
>>>>>>> related
>>>>>>> resources will be cleared up.
>>>>>>
>>>>>> The issue is the state were tracked partially, do we need a full
>>>>>> state
>>>>>> machine here?
>>>>>>
>>>>>
>>>>> Not, IMHO, we only care about the last state of it, because, we will
>>>>> do nothing
>>>>> even if we track the intermedial states.
>>>
>>> Well, you care at least syn state too. Without a complete state machine,
>>> it's very hard to track even partial state I believe. And you will fail
>>> to track some state transition for sure which makes the code fragile.
>>>
>>
>> Agree, but here things are a little different. There are some extreme
>> cases
>> that we may can't track the complete process of closing connection.
>> For example (I have explained that in the bellow, it seems that you
>> didn't
>> got it ;) ).
>> If VM is running before we want to make it goes into COLO FT state,
>> there maybe some connections exist already, in extreme case, VM is
>> going into
>> COLO state while some connections are in half closing state, we can
>> only track
>> the bellow half closing state in filter-rewriter and colo compare object.
>>
>>>>>
>>>>>>>
>>>>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>>>>> ---
>>>>>>>       net/colo.h            |  4 +++
>>>>>>>       net/filter-rewriter.c | 70
>>>>>>> +++++++++++++++++++++++++++++++++++++++++++++------
>>>>>>>       2 files changed, 67 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/colo.h b/net/colo.h
>>>>>>> index 7c524f3..cd9027f 100644
>>>>>>> --- a/net/colo.h
>>>>>>> +++ b/net/colo.h
>>>>>>> @@ -18,6 +18,7 @@
>>>>>>>       #include "slirp/slirp.h"
>>>>>>>       #include "qemu/jhash.h"
>>>>>>>       #include "qemu/timer.h"
>>>>>>> +#include "slirp/tcp.h"
>>>>>>>
>>>>>>>       #define HASHTABLE_MAX_SIZE 16384
>>>>>>>
>>>>>>> @@ -69,6 +70,9 @@ typedef struct Connection {
>>>>>>>            * run once in independent tcp connection
>>>>>>>            */
>>>>>>>           int syn_flag;
>>>>>>> +
>>>>>>> +    int tcp_state; /* TCP FSM state */
>>>>>>> +    tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */
>>>>>>>       } Connection;
>>>>>>>
>>>>>>>       uint32_t connection_key_hash(const void *opaque);
>>>>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
>>>>>>> index c4ab91c..7e7ec35 100644
>>>>>>> --- a/net/filter-rewriter.c
>>>>>>> +++ b/net/filter-rewriter.c
>>>>>>> @@ -60,9 +60,9 @@ static int is_tcp_packet(Packet *pkt)
>>>>>>>       }
>>>>>>>
>>>>>>>       /* handle tcp packet from primary guest */
>>>>>>> -static int handle_primary_tcp_pkt(NetFilterState *nf,
>>>>>>> +static int handle_primary_tcp_pkt(RewriterState *rf,
>>>>>>>                                         Connection *conn,
>>>>>>> -                                  Packet *pkt)
>>>>>>> +                                  Packet *pkt, ConnectionKey *key)
>>>>>>>       {
>>>>>>>           struct tcphdr *tcp_pkt;
>>>>>>>
>>>>>>> @@ -97,15 +97,45 @@ static int
>>>>>>> handle_primary_tcp_pkt(NetFilterState *nf,
>>>>>>>               tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) +
>>>>>>> conn->offset);
>>>>>>>
>>>>>>>               net_checksum_calculate((uint8_t *)pkt->data,
>>>>>>> pkt->size);
>>>>>>> +        /*
>>>>>>> +         * Case 1:
>>>>>>> +         * The *server* side of this connect is VM, *client* tries
>>>>>>> to close
>>>>>>> +         * the connection.
>>>>>>> +         *
>>>>>>> +         * We got 'ack=1' packets from client side, it acks
>>>>>>> 'fin=1, ack=1'
>>>>>>> +         * packet from server side. From this point, we can ensure
>>>>>>> that there
>>>>>>> +         * will be no packets in the connection, except that, some
>>>>>>> errors
>>>>>>> +         * happen between the path of 'filter object' and vNIC, if
>>>>>>> this rare
>>>>>>> +         * case really happen, we can still create a new
>>>>>>> connection,
>>>>>>> +         * So it is safe to remove the connection from
>>>>>>> connection_track_table.
>>>>>>> +         *
>>>>>>> +         */
>>>>>>> +        if ((conn->tcp_state == TCPS_LAST_ACK) &&
>>>>>>> +            (ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
>>>>>>> +            fprintf(stderr, "Remove conn "
>>>>>>
>>>>>> Can this even compile?
>>>>>>
>>>>>
>>>>> Oops, i forgot to remove it, will remove it in next version.
>>>>>
>>>>>>> + g_hash_table_remove(rf->connection_track_table, key);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +    /*
>>>>>>> +     * Case 2:
>>>>>>> +     * The *server* side of this connect is VM, *server* tries to
>>>>>>> close
>>>>>>> +     * the connection.
>>>>>>> +     *
>>>>>>> +     * We got 'fin=1, ack=1' packet from client side, we need to
>>>>>>> +     * record the seq of 'fin=1, ack=1' packet.
>>>>>>> +     */
>>>>>>> +    if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK |
>>>>>>> TH_FIN)) {
>>>>>>> +        conn->fin_ack_seq = htonl(tcp_pkt->th_seq);
>>>>>>> +        conn->tcp_state = TCPS_LAST_ACK;
>>>>>>>           }
>>>>>>>
>>>>>>>           return 0;
>>>>>>>       }
>>>>>>>
>>>>>>>       /* handle tcp packet from secondary guest */
>>>>>>> -static int handle_secondary_tcp_pkt(NetFilterState *nf,
>>>>>>> +static int handle_secondary_tcp_pkt(RewriterState *rf,
>>>>>>>                                           Connection *conn,
>>>>>>> -                                    Packet *pkt)
>>>>>>> +                                    Packet *pkt, ConnectionKey
>>>>>>> *key)
>>>>>>>       {
>>>>>>>           struct tcphdr *tcp_pkt;
>>>>>>>
>>>>>>> @@ -133,8 +163,34 @@ static int
>>>>>>> handle_secondary_tcp_pkt(NetFilterState *nf,
>>>>>>>               tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) -
>>>>>>> conn->offset);
>>>>>>>
>>>>>>>               net_checksum_calculate((uint8_t *)pkt->data,
>>>>>>> pkt->size);
>>>>>>> +        /*
>>>>>>> +         * Case 2:
>>>>>>> +         * The *server* side of this connect is VM, *server* tries
>>>>>>> to close
>>>>>>> +         * the connection.
>>>>>>> +         *
>>>>>>> +         * We got 'ack=1' packets from server side, it acks
>>>>>>> 'fin=1, ack=1'
>>>>>>> +         * packet from client side. Like Case 1, there should be
>>>>>>> no packets
>>>>>>> +         * in the connection from now know, But the difference
>>>>>>> here is
>>>>>>> +         * if the packet is lost, We will get the resent
>>>>>>> 'fin=1,ack=1' packet.
>>>>>>> +         * TODO: Fix above case.
>>>>>>> +         */
>>>>>>> +        if ((conn->tcp_state == TCPS_LAST_ACK) &&
>>>>>>> +            (ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
>>>>>>> + g_hash_table_remove(rf->connection_track_table, key);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +    /*
>>>>>>> +     * Case 1:
>>>>>>> +     * The *server* side of this connect is VM, *client* tries to
>>>>>>> close
>>>>>>> +     * the connection.
>>>>>>> +     *
>>>>>>> +     * We got 'fin=1, ack=1' packet from server side, we need to
>>>>>>> +     * record the seq of 'fin=1, ack=1' packet.
>>>>>>> +     */
>>>>>>> +    if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK |
>>>>>>> TH_FIN)) {
>>>>>>> +        conn->fin_ack_seq = ntohl(tcp_pkt->th_seq);
>>>>>>> +        conn->tcp_state = TCPS_LAST_ACK;
>>>>>>
>>>>>> I thought the tcp_state should store the state of TCP from the
>>>>>> view of
>>>>>> secondary VM? So TCPS_LAST_ACK is wrong and bring lots of confusion.
>>>>>> And
>>>>>> the handle of active close needs more states here. E.g if
>>>>>> connection is
>>>>>> in FIN_WAIT_2, the connection is only half closed, remote peer can
>>>>>> still
>>>>>> send packet to us unless we receive a FIN.
>>>>>>
>>>>>
>>>>> Yes, i know what you mean, actually, here, we try to only track the
>>>>> last
>>>>> two steps for closing a connection, that is
>>>>> 'fin=1,ack=1,seq=2,ack=u+1'
>>>>                                                ^
>>>> 'FIN=1,ACK=1,seq=w,ack=u+1'
>>>>
>>>>> and 'ack=1,seq=u+1,ack=w+1', because if we get a 'fin=1,ack=1', we can
>>>>        ^ 'ACK=1,seq=u+1,ack=w+1'                     ^ 'FIN=1,ACK=1'
>>>>
>>>>> ensure that the 'fin=1,seq=u' packet has been posted.
>>>>>
>>>>                     ^ 'FIN=1,seq=u'
>>>
>>> That's just the case I'm saying, the transition above is in fact:
>>>
>>> secondary(ESTABLISHED)
>>> secondary(FIN_WAIT_1): ->  FIN,seq=w,ack=u+1   -> :remote
>>> secondary(FIN_WAIT_2): <-  seq=u+1,ack=w+1     <- :remote
>>>
>>> So we are in fact in FIN_WAIT_2, which means the connection is only half
>>> closed, but your patch will treat this as fully closed connection and
>>> will remove the connection from the hashtable.
>>>
>>
>> Er, here we track the last two states 'FIN=1, ACK=1' and  'ACK=1' (
>> which asks
>> the 'FIN=1,ACK=1' packet, We will remove the connection while got the
>> 'ACK=1'
>> packet, so  is it enough ?
>
> But the connection is not closed in fact, no? It's legal for remote to
> continue sending tons of packet to us even after this.
>

Er, I'm a little confused, Here, for server side,
i think after got the 'ACK=1,seq=u+1,ack=w+1', it is closed,
so i remove it from hash table, wrong ?

Client:                                    Server:

ESTABLISHED|                               |
            | -> FIN=1,seq=u   ->           |
FIN_WAIT_1 |                               |
            | <- ACK=1,seq=v,ack=u+1 <-     |
FINA_WAIT_2|                               |CLOSE_WAIT
            | <- FIN=1,ACK=1,seq=w,ack=u+1<-|
            |                               |LAST+ACK
            |   -> ACK=1,seq=u+1,ack=w+1    |
TIME_WAIT  |                               |CLOSED
CLOSED     |                               |

>>
>>> What's more I don't think we can decide passive or active close by:
>>>
>>>
>>> +    if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) {
>>>
>>> Since both cases will send FIN,ACK for sure.
>>>
>>
>> I didn't quite understand, here we have tracked the closing request no
>> matter
>> it is from the server side (passive close ?) or client side ( active
>> close ?).
>> You can refer to the comment in codes, 'Case 1' and 'Case 2' comments.
>
> I think you need differ them since passive close is much simpler, and it
> seems that your code may work only in this case.
>
>>
>> Here, it seems that we can't track one case which both sides send the
>> closing
>> requests at the same time, in that case, there are only 'FIN=1' and
>> 'ACK=1'
>> packets.
>>
>
> Yes, RFC allows this.
>

Hmm, I'd like to remove this patch from this series,
And send it as a single patch after all the above questions
been solved. How about the other patches ?

Thanks,
Hailiang


> Thanks
>
>>
>> Thanks.
>> Hailiang
>>
>>>>
>>>>> Another reason is we may can't track the 'fin=1,seq=u' packet while
>>>>> we start COLO while one connection is closing, which the
>>>>> 'fin=1,seq=u' packet
>>>>> has been posted.
>>>>>
>>>>> Actually, here, if we start COLO while one connection is closing,
>>>>> which the
>>>>> 'fin=1,ack=1' has been posted, we can only track 'ack=1' packet. In
>>>>> this
>>>>
>>>>    ^ 'FIN=1,ACK=1'
>>>>
>>>> Sorry for the typo. :)
>>>>
>>>>> case, the connection will be left in hash table for ever though it is
>>>>> harmless.
>>>>> Any ideas for this case ?
>>>
>>> Sorry I don't follow the question.
>>>
>>>>>
>>>>> For the above codes question, i'd like to change tcp_state to
>>>>> tap_closing_wait,
>>>>> is it OK ?
>>>
>>> You mean "tcp_closing_wait". I think we need first figure out if we can
>>> track the state correctly first.
>>>
>>> Thanks
>>>
>>>>>
>>>>> Thanks.
>>>>> Hailiang
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>>           }
>>>>>>> -
>>>>>>>           return 0;
>>>>>>>       }
>>>>>>>
>>>>>>> @@ -178,7 +234,7 @@ static ssize_t
>>>>>>> colo_rewriter_receive_iov(NetFilterState *nf,
>>>>>>>
>>>>>>>               if (sender == nf->netdev) {
>>>>>>>                   /* NET_FILTER_DIRECTION_TX */
>>>>>>> -            if (!handle_primary_tcp_pkt(nf, conn, pkt)) {
>>>>>>> +            if (!handle_primary_tcp_pkt(s, conn, pkt, &key)) {
>>>>>>> qemu_net_queue_send(s->incoming_queue, sender, 0,
>>>>>>>                       (const uint8_t *)pkt->data, pkt->size, NULL);
>>>>>>>                       packet_destroy(pkt, NULL);
>>>>>>> @@ -191,7 +247,7 @@ static ssize_t
>>>>>>> colo_rewriter_receive_iov(NetFilterState *nf,
>>>>>>>                   }
>>>>>>>               } else {
>>>>>>>                   /* NET_FILTER_DIRECTION_RX */
>>>>>>> -            if (!handle_secondary_tcp_pkt(nf, conn, pkt)) {
>>>>>>> +            if (!handle_secondary_tcp_pkt(s, conn, pkt, &key)) {
>>>>>>> qemu_net_queue_send(s->incoming_queue, sender, 0,
>>>>>>>                       (const uint8_t *)pkt->data, pkt->size, NULL);
>>>>>>>                       packet_destroy(pkt, NULL);
>>>>>>
>>>>>>
>>>>>> .
>>>>>>
>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
>
>
> .
>

  reply	other threads:[~2017-02-27  4:10 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-22  3:46 [Qemu-devel] [PATCH v2 0/3] filter-rewriter: fix two bugs and one optimization zhanghailiang
2017-02-22  3:46 ` [Qemu-devel] [PATCH v2 1/3] net/colo: fix memory double free error zhanghailiang
2017-02-22  8:39   ` Zhang Chen
2017-02-22  3:46 ` [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table zhanghailiang
2017-02-22  8:07   ` Jason Wang
2017-02-22  8:45     ` Hailiang Zhang
2017-02-22  8:51       ` Hailiang Zhang
2017-02-23  4:16         ` Jason Wang
2017-02-27  3:11           ` Hailiang Zhang
2017-02-27  3:40             ` Jason Wang
2017-02-27  4:09               ` Hailiang Zhang [this message]
2017-02-27  5:35                 ` Jason Wang
2017-02-27  6:53                   ` Hailiang Zhang
2017-02-27  9:05                     ` Jason Wang
2017-02-27 10:29                       ` Hailiang Zhang
2017-02-28  3:14                         ` Jason Wang
2017-02-22  3:46 ` [Qemu-devel] [PATCH v2 3/3] filter-rewriter: skip net_checksum_calculate() while offset = 0 zhanghailiang
2017-02-24  8:08   ` Zhang Chen
2017-02-24  8:23     ` Zhang Chen
2017-02-27  1:36     ` Hailiang Zhang
2017-02-27  3:44       ` Zhang Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58B3A688.5040707@huawei.com \
    --to=zhang.zhanghailiang@huawei.com \
    --cc=jasowang@redhat.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=pss.wulizhen@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=xuquan8@huawei.com \
    --cc=zhangchen.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).