qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: peter.maydell@linaro.org, qemu-devel@nongnu.org
Cc: Zhang Chen <zhangckid@gmail.com>,
	zhanghailiang <zhang.zhanghailiang@huawei.com>,
	Zhang Chen <chen.zhang@intel.com>,
	Jason Wang <jasowang@redhat.com>
Subject: [Qemu-devel] [PULL 01/26] filter-rewriter: Add TCP state machine and fix memory leak in connection_track_table
Date: Mon, 15 Oct 2018 16:46:01 +0800	[thread overview]
Message-ID: <1539593186-32183-2-git-send-email-jasowang@redhat.com> (raw)
In-Reply-To: <1539593186-32183-1-git-send-email-jasowang@redhat.com>

From: Zhang Chen <zhangckid@gmail.com>

We add almost full TCP state machine in filter-rewriter, except
TCPS_LISTEN and some simplify in VM active close FIN states.
The reason for this simplify job is because guest kernel will track
the TCP status and wait 2MSL time too, if client resend the FIN packet,
guest will resend the last ACK, so we needn't wait 2MSL time in filter-rewriter.

After a net connection is closed, we didn't clear its related resources
in connection_track_table, which will lead to memory leak.

Let's track the state of net connection, if it is closed, its related
resources will be cleared up.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo.c            |   2 +-
 net/colo.h            |   9 ++---
 net/filter-rewriter.c | 109 +++++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 104 insertions(+), 16 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index 6dda4ed..97c8fc9 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -137,7 +137,7 @@ Connection *connection_new(ConnectionKey *key)
     conn->ip_proto = key->ip_proto;
     conn->processing = false;
     conn->offset = 0;
-    conn->syn_flag = 0;
+    conn->tcp_state = TCPS_CLOSED;
     conn->pack = 0;
     conn->sack = 0;
     g_queue_init(&conn->primary_list);
diff --git a/net/colo.h b/net/colo.h
index da6c36d..0277e0e 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -18,6 +18,7 @@
 #include "slirp/slirp.h"
 #include "qemu/jhash.h"
 #include "qemu/timer.h"
+#include "slirp/tcp.h"
 
 #define HASHTABLE_MAX_SIZE 16384
 
@@ -81,11 +82,9 @@ typedef struct Connection {
     uint32_t sack;
     /* offset = secondary_seq - primary_seq */
     tcp_seq  offset;
-    /*
-     * we use this flag update offset func
-     * run once in independent tcp connection
-     */
-    int syn_flag;
+
+    int tcp_state; /* TCP FSM state */
+    tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */
 } Connection;
 
 uint32_t connection_key_hash(const void *opaque);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index f584e4e..dd323fa 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -59,9 +59,9 @@ static int is_tcp_packet(Packet *pkt)
 }
 
 /* handle tcp packet from primary guest */
-static int handle_primary_tcp_pkt(NetFilterState *nf,
+static int handle_primary_tcp_pkt(RewriterState *rf,
                                   Connection *conn,
-                                  Packet *pkt)
+                                  Packet *pkt, ConnectionKey *key)
 {
     struct tcphdr *tcp_pkt;
 
@@ -74,23 +74,28 @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
         trace_colo_filter_rewriter_conn_offset(conn->offset);
     }
 
+    if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN)) &&
+        conn->tcp_state == TCPS_SYN_SENT) {
+        conn->tcp_state = TCPS_ESTABLISHED;
+    }
+
     if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
         /*
          * we use this flag update offset func
          * run once in independent tcp connection
          */
-        conn->syn_flag = 1;
+        conn->tcp_state = TCPS_SYN_RECEIVED;
     }
 
     if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) {
-        if (conn->syn_flag) {
+        if (conn->tcp_state == TCPS_SYN_RECEIVED) {
             /*
              * offset = secondary_seq - primary seq
              * ack packet sent by guest from primary node,
              * so we use th_ack - 1 get primary_seq
              */
             conn->offset -= (ntohl(tcp_pkt->th_ack) - 1);
-            conn->syn_flag = 0;
+            conn->tcp_state = TCPS_ESTABLISHED;
         }
         if (conn->offset) {
             /* handle packets to the secondary from the primary */
@@ -99,15 +104,66 @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
             net_checksum_calculate((uint8_t *)pkt->data + pkt->vnet_hdr_len,
                                    pkt->size - pkt->vnet_hdr_len);
         }
+
+        /*
+         * Passive close step 3
+         */
+        if ((conn->tcp_state == TCPS_LAST_ACK) &&
+            (ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
+            conn->tcp_state = TCPS_CLOSED;
+            g_hash_table_remove(rf->connection_track_table, key);
+        }
+    }
+
+    if ((tcp_pkt->th_flags & TH_FIN) == TH_FIN) {
+        /*
+         * Passive close.
+         * Step 1:
+         * The *server* side of this connect is VM, *client* tries to close
+         * the connection. We will into CLOSE_WAIT status.
+         *
+         * Step 2:
+         * In this step we will into LAST_ACK status.
+         *
+         * We got 'fin=1, ack=1' packet from server side, we need to
+         * record the seq of 'fin=1, ack=1' packet.
+         *
+         * Step 3:
+         * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1'
+         * packet from server side. From this point, we can ensure that there
+         * will be no packets in the connection, except that, some errors
+         * happen between the path of 'filter object' and vNIC, if this rare
+         * case really happen, we can still create a new connection,
+         * So it is safe to remove the connection from connection_track_table.
+         *
+         */
+        if (conn->tcp_state == TCPS_ESTABLISHED) {
+            conn->tcp_state = TCPS_CLOSE_WAIT;
+        }
+
+        /*
+         * Active close step 2.
+         */
+        if (conn->tcp_state == TCPS_FIN_WAIT_1) {
+            conn->tcp_state = TCPS_TIME_WAIT;
+            /*
+             * For simplify implementation, we needn't wait 2MSL time
+             * in filter rewriter. Because guest kernel will track the
+             * TCP status and wait 2MSL time, if client resend the FIN
+             * packet, guest will apply the last ACK too.
+             */
+            conn->tcp_state = TCPS_CLOSED;
+            g_hash_table_remove(rf->connection_track_table, key);
+        }
     }
 
     return 0;
 }
 
 /* handle tcp packet from secondary guest */
-static int handle_secondary_tcp_pkt(NetFilterState *nf,
+static int handle_secondary_tcp_pkt(RewriterState *rf,
                                     Connection *conn,
-                                    Packet *pkt)
+                                    Packet *pkt, ConnectionKey *key)
 {
     struct tcphdr *tcp_pkt;
 
@@ -121,7 +177,8 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
         trace_colo_filter_rewriter_conn_offset(conn->offset);
     }
 
-    if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) {
+    if (conn->tcp_state == TCPS_SYN_RECEIVED &&
+        ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) {
         /*
          * save offset = secondary_seq and then
          * in handle_primary_tcp_pkt make offset
@@ -130,6 +187,12 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
         conn->offset = ntohl(tcp_pkt->th_seq);
     }
 
+    /* VM active connect */
+    if (conn->tcp_state == TCPS_CLOSED &&
+        ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
+        conn->tcp_state = TCPS_SYN_SENT;
+    }
+
     if ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK) {
         /* Only need to adjust seq while offset is Non-zero */
         if (conn->offset) {
@@ -141,6 +204,32 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
         }
     }
 
+    /*
+     * Passive close step 2:
+     */
+    if (conn->tcp_state == TCPS_CLOSE_WAIT &&
+        (tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) {
+        conn->fin_ack_seq = ntohl(tcp_pkt->th_seq);
+        conn->tcp_state = TCPS_LAST_ACK;
+    }
+
+    /*
+     * Active close
+     *
+     * Step 1:
+     * The *server* side of this connect is VM, *server* tries to close
+     * the connection.
+     *
+     * Step 2:
+     * We will into CLOSE_WAIT status.
+     * We simplify the TCPS_FIN_WAIT_2, TCPS_TIME_WAIT and
+     * CLOSING status.
+     */
+    if (conn->tcp_state == TCPS_ESTABLISHED &&
+        (tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == TH_FIN) {
+        conn->tcp_state = TCPS_FIN_WAIT_1;
+    }
+
     return 0;
 }
 
@@ -190,7 +279,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
 
         if (sender == nf->netdev) {
             /* NET_FILTER_DIRECTION_TX */
-            if (!handle_primary_tcp_pkt(nf, conn, pkt)) {
+            if (!handle_primary_tcp_pkt(s, conn, pkt, &key)) {
                 qemu_net_queue_send(s->incoming_queue, sender, 0,
                 (const uint8_t *)pkt->data, pkt->size, NULL);
                 packet_destroy(pkt, NULL);
@@ -203,7 +292,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
             }
         } else {
             /* NET_FILTER_DIRECTION_RX */
-            if (!handle_secondary_tcp_pkt(nf, conn, pkt)) {
+            if (!handle_secondary_tcp_pkt(s, conn, pkt, &key)) {
                 qemu_net_queue_send(s->incoming_queue, sender, 0,
                 (const uint8_t *)pkt->data, pkt->size, NULL);
                 packet_destroy(pkt, NULL);
-- 
2.5.0

  reply	other threads:[~2018-10-15  8:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-15  8:46 [Qemu-devel] [PULL 00/26] Net patches Jason Wang
2018-10-15  8:46 ` Jason Wang [this message]
2018-10-15  8:46 ` [Qemu-devel] [PULL 02/26] colo-compare: implement the process of checkpoint Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 03/26] colo-compare: use notifier to notify packets comparing result Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 04/26] COLO: integrate colo compare with colo frame Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 05/26] COLO: Add block replication into colo process Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 06/26] COLO: Remove colo_state migration struct Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 07/26] COLO: Load dirty pages into SVM's RAM cache firstly Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 08/26] ram/COLO: Record the dirty pages that SVM received Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 09/26] COLO: Flush memory data from ram cache Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 10/26] qmp event: Add COLO_EXIT event to notify users while exited COLO Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 11/26] qapi/migration.json: Rename COLO unknown mode to none mode Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 12/26] qapi: Add new command to query colo status Jason Wang
2018-10-15 18:31   ` Eric Blake
2018-10-15  8:46 ` [Qemu-devel] [PULL 13/26] savevm: split the process of different stages for loadvm/savevm Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 14/26] COLO: flush host dirty ram from cache Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 15/26] filter: Add handle_event method for NetFilterClass Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 16/26] filter-rewriter: handle checkpoint and failover event Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 17/26] COLO: notify net filters about checkpoint/failover event Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 18/26] COLO: quick failover process by kick COLO thread Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 19/26] docs: Add COLO status diagram to COLO-FT.txt Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 20/26] clean up callback when del virtqueue Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 21/26] ne2000: fix possible out of bound access in ne2000_receive Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 22/26] rtl8139: fix possible out of bound access Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 23/26] pcnet: fix possible buffer overflow Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 24/26] net: ignore packet size greater than INT_MAX Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 25/26] e1000: indicate dropped packets in HW counters Jason Wang
2018-10-15  8:46 ` [Qemu-devel] [PULL 26/26] qemu-options: Fix bad "macaddr" property in the documentation Jason Wang
2018-10-15 13:20 ` [Qemu-devel] [PULL 00/26] Net patches Peter Maydell
2018-10-16 10:10   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1539593186-32183-2-git-send-email-jasowang@redhat.com \
    --to=jasowang@redhat.com \
    --cc=chen.zhang@intel.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=zhang.zhanghailiang@huawei.com \
    --cc=zhangckid@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).