From: Jason Wang <jasowang@redhat.com>
To: Hailiang Zhang <zhang.zhanghailiang@huawei.com>,
zhangchen.fnst@cn.fujitsu.com, lizhijian@cn.fujitsu.com
Cc: xuquan8@huawei.com, qemu-devel@nongnu.org, pss.wulizhen@huawei.com
Subject: Re: [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table
Date: Thu, 23 Feb 2017 12:16:50 +0800 [thread overview]
Message-ID: <b792c68e-d67a-ac24-26ea-aaf24f1f9437@redhat.com> (raw)
In-Reply-To: <58AD5125.4010306@huawei.com>
On 2017年02月22日 16:51, Hailiang Zhang wrote:
> On 2017/2/22 16:45, Hailiang Zhang wrote:
>> On 2017/2/22 16:07, Jason Wang wrote:
>>>
>>>
>>> On 2017年02月22日 11:46, zhanghailiang wrote:
>>>> After a net connection is closed, we didn't clear its releated
>>>> resources
>>>> in connection_track_table, which will lead to memory leak.
>>>
>>> Not a real leak but would lead reset of hash table if too many closed
>>> connections.
>>>
>>
>> Yes, you are right, there will be lots of stale connection data in
>> hash table
>> if we don't remove it while it is been closed. Which
Ok, so let's come up with a better title of the patch.
>>
>>>>
>>>> Let't track the state of net connection, if it is closed, its related
>>>> resources will be cleared up.
>>>
>>> The issue is the state were tracked partially, do we need a full state
>>> machine here?
>>>
>>
>> Not, IMHO, we only care about the last state of it, because, we will
>> do nothing
>> even if we track the intermedial states.
Well, you care at least syn state too. Without a complete state machine,
it's very hard to track even partial state I believe. And you will fail
to track some state transition for sure which makes the code fragile.
>>
>>>>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> ---
>>>> net/colo.h | 4 +++
>>>> net/filter-rewriter.c | 70
>>>> +++++++++++++++++++++++++++++++++++++++++++++------
>>>> 2 files changed, 67 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/net/colo.h b/net/colo.h
>>>> index 7c524f3..cd9027f 100644
>>>> --- a/net/colo.h
>>>> +++ b/net/colo.h
>>>> @@ -18,6 +18,7 @@
>>>> #include "slirp/slirp.h"
>>>> #include "qemu/jhash.h"
>>>> #include "qemu/timer.h"
>>>> +#include "slirp/tcp.h"
>>>>
>>>> #define HASHTABLE_MAX_SIZE 16384
>>>>
>>>> @@ -69,6 +70,9 @@ typedef struct Connection {
>>>> * run once in independent tcp connection
>>>> */
>>>> int syn_flag;
>>>> +
>>>> + int tcp_state; /* TCP FSM state */
>>>> + tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */
>>>> } Connection;
>>>>
>>>> uint32_t connection_key_hash(const void *opaque);
>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
>>>> index c4ab91c..7e7ec35 100644
>>>> --- a/net/filter-rewriter.c
>>>> +++ b/net/filter-rewriter.c
>>>> @@ -60,9 +60,9 @@ static int is_tcp_packet(Packet *pkt)
>>>> }
>>>>
>>>> /* handle tcp packet from primary guest */
>>>> -static int handle_primary_tcp_pkt(NetFilterState *nf,
>>>> +static int handle_primary_tcp_pkt(RewriterState *rf,
>>>> Connection *conn,
>>>> - Packet *pkt)
>>>> + Packet *pkt, ConnectionKey *key)
>>>> {
>>>> struct tcphdr *tcp_pkt;
>>>>
>>>> @@ -97,15 +97,45 @@ static int
>>>> handle_primary_tcp_pkt(NetFilterState *nf,
>>>> tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) +
>>>> conn->offset);
>>>>
>>>> net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
>>>> + /*
>>>> + * Case 1:
>>>> + * The *server* side of this connect is VM, *client* tries
>>>> to close
>>>> + * the connection.
>>>> + *
>>>> + * We got 'ack=1' packets from client side, it acks
>>>> 'fin=1, ack=1'
>>>> + * packet from server side. From this point, we can ensure
>>>> that there
>>>> + * will be no packets in the connection, except that, some
>>>> errors
>>>> + * happen between the path of 'filter object' and vNIC, if
>>>> this rare
>>>> + * case really happen, we can still create a new connection,
>>>> + * So it is safe to remove the connection from
>>>> connection_track_table.
>>>> + *
>>>> + */
>>>> + if ((conn->tcp_state == TCPS_LAST_ACK) &&
>>>> + (ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
>>>> + fprintf(stderr, "Remove conn "
>>>
>>> Can this even compile?
>>>
>>
>> Oops, i forgot to remove it, will remove it in next version.
>>
>>>> + g_hash_table_remove(rf->connection_track_table, key);
>>>> + }
>>>> + }
>>>> + /*
>>>> + * Case 2:
>>>> + * The *server* side of this connect is VM, *server* tries to
>>>> close
>>>> + * the connection.
>>>> + *
>>>> + * We got 'fin=1, ack=1' packet from client side, we need to
>>>> + * record the seq of 'fin=1, ack=1' packet.
>>>> + */
>>>> + if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK |
>>>> TH_FIN)) {
>>>> + conn->fin_ack_seq = htonl(tcp_pkt->th_seq);
>>>> + conn->tcp_state = TCPS_LAST_ACK;
>>>> }
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> /* handle tcp packet from secondary guest */
>>>> -static int handle_secondary_tcp_pkt(NetFilterState *nf,
>>>> +static int handle_secondary_tcp_pkt(RewriterState *rf,
>>>> Connection *conn,
>>>> - Packet *pkt)
>>>> + Packet *pkt, ConnectionKey *key)
>>>> {
>>>> struct tcphdr *tcp_pkt;
>>>>
>>>> @@ -133,8 +163,34 @@ static int
>>>> handle_secondary_tcp_pkt(NetFilterState *nf,
>>>> tcp_pkt->th_seq = htonl(ntohl(tcp_pkt->th_seq) -
>>>> conn->offset);
>>>>
>>>> net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
>>>> + /*
>>>> + * Case 2:
>>>> + * The *server* side of this connect is VM, *server* tries
>>>> to close
>>>> + * the connection.
>>>> + *
>>>> + * We got 'ack=1' packets from server side, it acks
>>>> 'fin=1, ack=1'
>>>> + * packet from client side. Like Case 1, there should be
>>>> no packets
>>>> + * in the connection from now know, But the difference
>>>> here is
>>>> + * if the packet is lost, We will get the resent
>>>> 'fin=1,ack=1' packet.
>>>> + * TODO: Fix above case.
>>>> + */
>>>> + if ((conn->tcp_state == TCPS_LAST_ACK) &&
>>>> + (ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
>>>> + g_hash_table_remove(rf->connection_track_table, key);
>>>> + }
>>>> + }
>>>> + /*
>>>> + * Case 1:
>>>> + * The *server* side of this connect is VM, *client* tries to
>>>> close
>>>> + * the connection.
>>>> + *
>>>> + * We got 'fin=1, ack=1' packet from server side, we need to
>>>> + * record the seq of 'fin=1, ack=1' packet.
>>>> + */
>>>> + if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK |
>>>> TH_FIN)) {
>>>> + conn->fin_ack_seq = ntohl(tcp_pkt->th_seq);
>>>> + conn->tcp_state = TCPS_LAST_ACK;
>>>
>>> I thought the tcp_state should store the state of TCP from the view of
>>> secondary VM? So TCPS_LAST_ACK is wrong and bring lots of confusion.
>>> And
>>> the handle of active close needs more states here. E.g if connection is
>>> in FIN_WAIT_2, the connection is only half closed, remote peer can
>>> still
>>> send packet to us unless we receive a FIN.
>>>
>>
>> Yes, i know what you mean, actually, here, we try to only track the last
>> two steps for closing a connection, that is 'fin=1,ack=1,seq=2,ack=u+1'
> ^
> 'FIN=1,ACK=1,seq=w,ack=u+1'
>
>> and 'ack=1,seq=u+1,ack=w+1', because if we get a 'fin=1,ack=1', we can
> ^ 'ACK=1,seq=u+1,ack=w+1' ^ 'FIN=1,ACK=1'
>
>> ensure that the 'fin=1,seq=u' packet has been posted.
>>
> ^ 'FIN=1,seq=u'
That's just the case I'm saying, the transition above is in fact:
secondary(ESTABLISHED)
secondary(FIN_WAIT_1): -> FIN,seq=w,ack=u+1 -> :remote
secondary(FIN_WAIT_2): <- seq=u+1,ack=w+1 <- :remote
So we are in fact in FIN_WAIT_2, which means the connection is only half
closed, but your patch will treat this as fully closed connection and
will remove the connection from the hashtable.
What's more I don't think we can decide passive or active close by:
+ if ((tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) {
Since both cases will send FIN,ACK for sure.
>
>> Another reason is we may can't track the 'fin=1,seq=u' packet while
>> we start COLO while one connection is closing, which the
>> 'fin=1,seq=u' packet
>> has been posted.
>>
>> Actually, here, if we start COLO while one connection is closing,
>> which the
>> 'fin=1,ack=1' has been posted, we can only track 'ack=1' packet. In this
>
> ^ 'FIN=1,ACK=1'
>
> Sorry for the typo. :)
>
>> case, the connection will be left in hash table for ever though it is
>> harmless.
>> Any ideas for this case ?
Sorry I don't follow the question.
>>
>> For the above codes question, i'd like to change tcp_state to
>> tap_closing_wait,
>> is it OK ?
You mean "tcp_closing_wait". I think we need first figure out if we can
track the state correctly first.
Thanks
>>
>> Thanks.
>> Hailiang
>>
>>> Thanks
>>>
>>>> }
>>>> -
>>>> return 0;
>>>> }
>>>>
>>>> @@ -178,7 +234,7 @@ static ssize_t
>>>> colo_rewriter_receive_iov(NetFilterState *nf,
>>>>
>>>> if (sender == nf->netdev) {
>>>> /* NET_FILTER_DIRECTION_TX */
>>>> - if (!handle_primary_tcp_pkt(nf, conn, pkt)) {
>>>> + if (!handle_primary_tcp_pkt(s, conn, pkt, &key)) {
>>>> qemu_net_queue_send(s->incoming_queue, sender, 0,
>>>> (const uint8_t *)pkt->data, pkt->size, NULL);
>>>> packet_destroy(pkt, NULL);
>>>> @@ -191,7 +247,7 @@ static ssize_t
>>>> colo_rewriter_receive_iov(NetFilterState *nf,
>>>> }
>>>> } else {
>>>> /* NET_FILTER_DIRECTION_RX */
>>>> - if (!handle_secondary_tcp_pkt(nf, conn, pkt)) {
>>>> + if (!handle_secondary_tcp_pkt(s, conn, pkt, &key)) {
>>>> qemu_net_queue_send(s->incoming_queue, sender, 0,
>>>> (const uint8_t *)pkt->data, pkt->size, NULL);
>>>> packet_destroy(pkt, NULL);
>>>
>>>
>>> .
>>>
>
>
next prev parent reply other threads:[~2017-02-23 4:17 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-22 3:46 [Qemu-devel] [PATCH v2 0/3] filter-rewriter: fix two bugs and one optimization zhanghailiang
2017-02-22 3:46 ` [Qemu-devel] [PATCH v2 1/3] net/colo: fix memory double free error zhanghailiang
2017-02-22 8:39 ` Zhang Chen
2017-02-22 3:46 ` [Qemu-devel] [PATCH v2 2/3] filter-rewriter: fix memory leak for connection in connection_track_table zhanghailiang
2017-02-22 8:07 ` Jason Wang
2017-02-22 8:45 ` Hailiang Zhang
2017-02-22 8:51 ` Hailiang Zhang
2017-02-23 4:16 ` Jason Wang [this message]
2017-02-27 3:11 ` Hailiang Zhang
2017-02-27 3:40 ` Jason Wang
2017-02-27 4:09 ` Hailiang Zhang
2017-02-27 5:35 ` Jason Wang
2017-02-27 6:53 ` Hailiang Zhang
2017-02-27 9:05 ` Jason Wang
2017-02-27 10:29 ` Hailiang Zhang
2017-02-28 3:14 ` Jason Wang
2017-02-22 3:46 ` [Qemu-devel] [PATCH v2 3/3] filter-rewriter: skip net_checksum_calculate() while offset = 0 zhanghailiang
2017-02-24 8:08 ` Zhang Chen
2017-02-24 8:23 ` Zhang Chen
2017-02-27 1:36 ` Hailiang Zhang
2017-02-27 3:44 ` Zhang Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b792c68e-d67a-ac24-26ea-aaf24f1f9437@redhat.com \
--to=jasowang@redhat.com \
--cc=lizhijian@cn.fujitsu.com \
--cc=pss.wulizhen@huawei.com \
--cc=qemu-devel@nongnu.org \
--cc=xuquan8@huawei.com \
--cc=zhang.zhanghailiang@huawei.com \
--cc=zhangchen.fnst@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).