[PATCH 0/4] COLO net and runstate bugfix/optimization

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] COLO net and runstate bugfix/optimization
@ 2022-03-09  8:38 Zhang Chen
  2022-03-09  8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Zhang Chen @ 2022-03-09  8:38 UTC (permalink / raw)
  To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev

This series fix some COLO related issues in internal stress testing.

Zhang Chen (4):
  softmmu/runstate.c: add RunStateTransition support form COLO to
    PRELAUNCH
  net/colo: Fix a "double free" crash to clear the conn_list
  net/colo.c: No need to track conn_list for filter-rewriter
  net/colo.c: fix segmentation fault when packet is not parsed correctly

 net/colo-compare.c    |  2 +-
 net/colo.c            | 11 +++++++++--
 net/filter-rewriter.c |  2 +-
 net/trace-events      |  1 +
 softmmu/runstate.c    |  1 +
 5 files changed, 13 insertions(+), 4 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH
  2022-03-09  8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen
@ 2022-03-09  8:38 ` Zhang Chen
  2022-03-09  8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Zhang Chen @ 2022-03-09  8:38 UTC (permalink / raw)
  To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev, Like Xu

If the checkpoint occurs when the guest finishes restarting
but has not started running, the runstate_set() may reject
the transition from COLO to PRELAUNCH with the crash log:

{"timestamp": {"seconds": 1593484591, "microseconds": 26605},\
"event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
qemu-system-x86_64: invalid runstate transition: 'colo' -> 'prelaunch'

Long-term testing says that it's pretty safe.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 softmmu/runstate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index e0d869b21a..c021c56338 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -127,6 +127,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
 
     { RUN_STATE_COLO, RUN_STATE_RUNNING },
+    { RUN_STATE_COLO, RUN_STATE_PRELAUNCH },
     { RUN_STATE_COLO, RUN_STATE_SHUTDOWN},
 
     { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
  2022-03-09  8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen
  2022-03-09  8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen
@ 2022-03-09  8:38 ` Zhang Chen
  2022-03-21  3:05   ` lizhijian
  2022-03-09  8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen
  2022-03-09  8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen
  3 siblings, 1 reply; 13+ messages in thread
From: Zhang Chen @ 2022-03-09  8:38 UTC (permalink / raw)
  To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev, Like Xu

We notice the QEMU may crash when the guest has too many
incoming network connections with the following log:

15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
free(): invalid pointer
[1]    15195 abort (core dumped)  qemu-system-x86_64 ....

This is because we create the s->connection_track_table with
g_hash_table_new_full() which is defined as:

GHashTable * g_hash_table_new_full (GHashFunc hash_func,
                       GEqualFunc key_equal_func,
                       GDestroyNotify key_destroy_func,
                       GDestroyNotify value_destroy_func);

The fourth parameter connection_destroy() will be called to free the
memory allocated for all 'Connection' values in the hashtable when
we call g_hash_table_remove_all() in the connection_hashtable_reset().

It's unnecessary because we clear the conn_list explicitly later,
and it's buggy when other agents try to call connection_get()
with the same connection_track_table.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 net/colo-compare.c    | 2 +-
 net/filter-rewriter.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 62554b5b3c..ab054cfd21 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1324,7 +1324,7 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
     s->connection_track_table = g_hash_table_new_full(connection_key_hash,
                                                       connection_key_equal,
                                                       g_free,
-                                                      connection_destroy);
+                                                      NULL);
 
     colo_compare_iothread(s);
 
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index bf05023dc3..c18c4c2019 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
     s->connection_track_table = g_hash_table_new_full(connection_key_hash,
                                                       connection_key_equal,
                                                       g_free,
-                                                      connection_destroy);
+                                                      NULL);
     s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
  2022-03-09  8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen
@ 2022-03-21  3:05   ` lizhijian
  2022-03-28  9:13     ` Zhang, Chen
  0 siblings, 1 reply; 13+ messages in thread
From: lizhijian @ 2022-03-21  3:05 UTC (permalink / raw)
  To: Zhang Chen, Jason Wang, lizhijian@fujitsu.com; +Cc: qemu-dev, Like Xu



On 09/03/2022 16:38, Zhang Chen wrote:
> We notice the QEMU may crash when the guest has too many
> incoming network connections with the following log:
>
> 15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
> free(): invalid pointer
> [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
>
> This is because we create the s->connection_track_table with
> g_hash_table_new_full() which is defined as:
>
> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
>                         GEqualFunc key_equal_func,
>                         GDestroyNotify key_destroy_func,
>                         GDestroyNotify value_destroy_func);
>
> The fourth parameter connection_destroy() will be called to free the
> memory allocated for all 'Connection' values in the hashtable when
> we call g_hash_table_remove_all() in the connection_hashtable_reset().
>
> It's unnecessary because we clear the conn_list explicitly later,
> and it's buggy when other agents try to call connection_get()
> with the same connection_track_table.
>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
>   net/colo-compare.c    | 2 +-
>   net/filter-rewriter.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/colo-compare.c b/net/colo-compare.c
> index 62554b5b3c..ab054cfd21 100644
> --- a/net/colo-compare.c
> +++ b/net/colo-compare.c
> @@ -1324,7 +1324,7 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
>       s->connection_track_table = g_hash_table_new_full(connection_key_hash,
>                                                         connection_key_equal,
>                                                         g_free,
> -                                                      connection_destroy);
> +                                                      NULL);


202 /* if not found, create a new connection and add to hash table */
203 Connection *connection_get(GHashTable *connection_track_table,
204                            ConnectionKey *key,
205                            GQueue *conn_list)
206 {
207     Connection *conn = g_hash_table_lookup(connection_track_table, key);
208
209     if (conn == NULL) {
210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
211
212         conn = connection_new(key);
213
214         if (g_hash_table_size(connection_track_table) > HASHTABLE_MAX_SIZE) {
215             trace_colo_proxy_main("colo proxy connection hashtable full,"
216                                   " clear it");
217 connection_hashtable_reset(connection_track_table);

197 void connection_hashtable_reset(GHashTable *connection_track_table)
198 {
199 g_hash_table_remove_all(connection_track_table);
200 }

IIUC,  above subroutine will do some cleanup explicitly. And before your patch, connection_hashtable_reset()
will release all keys and their values in this hashtable. But now, you remove all keys and just
one value(conn_list) instead. Does it means other values will be leaked ?


218 /*
219              * clear the conn_list
220 */
221             while (!g_queue_is_empty(conn_list)) {
222 connection_destroy(g_queue_pop_head(conn_list));
223 }
224 }
225
226         g_hash_table_insert(connection_track_table, new_key, conn);
227 }
228
229     return conn;
230 }


Thanks
Zhijian

>   
>       colo_compare_iothread(s);
>   
> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
> index bf05023dc3..c18c4c2019 100644
> --- a/net/filter-rewriter.c
> +++ b/net/filter-rewriter.c
> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
>       s->connection_track_table = g_hash_table_new_full(connection_key_hash,
>                                                         connection_key_equal,
>                                                         g_free,
> -                                                      connection_destroy);
> +                                                      NULL);
>       s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
>   }
>   

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
  2022-03-21  3:05   ` lizhijian
@ 2022-03-28  9:13     ` Zhang, Chen
  2022-03-31  1:14       ` lizhijian
  0 siblings, 1 reply; 13+ messages in thread
From: Zhang, Chen @ 2022-03-28  9:13 UTC (permalink / raw)
  To: lizhijian@fujitsu.com, Jason Wang; +Cc: qemu-dev, Like Xu



> -----Original Message-----
> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> Sent: Monday, March 21, 2022 11:06 AM
> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> <jasowang@redhat.com>; lizhijian@fujitsu.com
> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
> conn_list
> 
> 
> 
> On 09/03/2022 16:38, Zhang Chen wrote:
> > We notice the QEMU may crash when the guest has too many incoming
> > network connections with the following log:
> >
> > 15197@1593578622.668573:colo_proxy_main : colo proxy connection
> > hashtable full, clear it
> > free(): invalid pointer
> > [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
> >
> > This is because we create the s->connection_track_table with
> > g_hash_table_new_full() which is defined as:
> >
> > GHashTable * g_hash_table_new_full (GHashFunc hash_func,
> >                         GEqualFunc key_equal_func,
> >                         GDestroyNotify key_destroy_func,
> >                         GDestroyNotify value_destroy_func);
> >
> > The fourth parameter connection_destroy() will be called to free the
> > memory allocated for all 'Connection' values in the hashtable when we
> > call g_hash_table_remove_all() in the connection_hashtable_reset().
> >
> > It's unnecessary because we clear the conn_list explicitly later, and
> > it's buggy when other agents try to call connection_get() with the
> > same connection_track_table.
> >
> > Signed-off-by: Like Xu <like.xu@linux.intel.com>
> > Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> > ---
> >   net/colo-compare.c    | 2 +-
> >   net/filter-rewriter.c | 2 +-
> >   2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/colo-compare.c b/net/colo-compare.c index
> > 62554b5b3c..ab054cfd21 100644
> > --- a/net/colo-compare.c
> > +++ b/net/colo-compare.c
> > @@ -1324,7 +1324,7 @@ static void
> colo_compare_complete(UserCreatable *uc, Error **errp)
> >       s->connection_track_table =
> g_hash_table_new_full(connection_key_hash,
> >                                                         connection_key_equal,
> >                                                         g_free,
> > -                                                      connection_destroy);
> > +                                                      NULL);
> 
> 
> 202 /* if not found, create a new connection and add to hash table */
> 203 Connection *connection_get(GHashTable *connection_track_table,
> 204                            ConnectionKey *key,
> 205                            GQueue *conn_list)
> 206 {
> 207     Connection *conn = g_hash_table_lookup(connection_track_table,
> key);
> 208
> 209     if (conn == NULL) {
> 210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
> 211
> 212         conn = connection_new(key);
> 213
> 214         if (g_hash_table_size(connection_track_table) >
> HASHTABLE_MAX_SIZE) {
> 215             trace_colo_proxy_main("colo proxy connection hashtable full,"
> 216                                   " clear it");
> 217 connection_hashtable_reset(connection_track_table);
> 
> 197 void connection_hashtable_reset(GHashTable *connection_track_table)
> 198 {
> 199 g_hash_table_remove_all(connection_track_table);
> 200 }
> 
> IIUC,  above subroutine will do some cleanup explicitly. And before your
> patch, connection_hashtable_reset() will release all keys and their values in
> this hashtable. But now, you remove all keys and just one value(conn_list)
> instead. Does it means other values will be leaked ?

Thanks Zhijian. Re-think about it. Yes, I think you are right.
It looks need a lock to avoid into connection_get() when triggered the clear hashtable operation.
What do you think?

Thanks
Chen


> 
> 
> 218 /*
> 219              * clear the conn_list
> 220 */
> 221             while (!g_queue_is_empty(conn_list)) {
> 222 connection_destroy(g_queue_pop_head(conn_list));
> 223 }
> 224 }
> 225
> 226         g_hash_table_insert(connection_track_table, new_key, conn);
> 227 }
> 228
> 229     return conn;
> 230 }
> 
> 
> Thanks
> Zhijian
> 
> >
> >       colo_compare_iothread(s);
> >
> > diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
> > bf05023dc3..c18c4c2019 100644
> > --- a/net/filter-rewriter.c
> > +++ b/net/filter-rewriter.c
> > @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf,
> Error **errp)
> >       s->connection_track_table =
> g_hash_table_new_full(connection_key_hash,
> >                                                         connection_key_equal,
> >                                                         g_free,
> > -                                                      connection_destroy);
> > +                                                      NULL);
> >       s->incoming_queue =
> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
> >   }
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
  2022-03-28  9:13     ` Zhang, Chen
@ 2022-03-31  1:14       ` lizhijian
  2022-03-31  2:25         ` Zhang, Chen
  0 siblings, 1 reply; 13+ messages in thread
From: lizhijian @ 2022-03-31  1:14 UTC (permalink / raw)
  To: Zhang, Chen, Jason Wang; +Cc: qemu-dev, Like Xu


connection_track_table
-----+----------
key1 | conn    |-----------------------------------------------------------+
-----+----------                                                           |
key2 | conn    |----------------------------------+                        |
-----+----------                                  |                        |
key3 | conn    |---------+                        |                        |
-----+----------         |                        |                        |
                          |                        |                        |
                          |                        |                        |
     + CompareState ++    |                        |                        |
     |               |    V                        V                        V
     +---------------+   +---------------+         +---------------+
     |conn_list      +--->conn           +--------->conn           |     connx
     +---------------+   +---------------+         +---------------+
     |               |     |           |             |          |
     +---------------+ +---v----+  +---v----+    +---v----+ +---v----+
                       |primary |  |secondary    |primary | |secondary
                       |packet  |  |packet  +    |packet  | |packet  +
                       +--------+  +--------+    +--------+ +--------+
                           |           |             |          |
                       +---v----+  +---v----+    +---v----+ +---v----+
                       |primary |  |secondary    |primary | |secondary
                       |packet  |  |packet  +    |packet  | |packet  +
                       +--------+  +--------+    +--------+ +--------+
                           |           |             |          |
                       +---v----+  +---v----+    +---v----+ +---v----+
                       |primary |  |secondary    |primary | |secondary
                       |packet  |  |packet  +    |packet  | |packet  +
                       +--------+  +--------+    +--------+ +--------+
  
I recalled that we should above relationships between connection_track_table conn_list and conn.
That means both connection_track_table and conn_list reference to the same conn instance.

So before this patch, connection_get() is possible to use-after-free/double free conn. where 1st was in
connection_hashtable_reset() and 2nd was
221             while (!g_queue_is_empty(conn_list)) {
222                 connection_destroy(g_queue_pop_head(conn_list));
223             }

I also doubt that your current abort was just due to above use-after-free/double free.
If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd place.

Thanks
Zhijian


On 28/03/2022 17:13, Zhang, Chen wrote:
>
>> -----Original Message-----
>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
>> Sent: Monday, March 21, 2022 11:06 AM
>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
>> <jasowang@redhat.com>; lizhijian@fujitsu.com
>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
>> conn_list
>>
>>
>>
>> On 09/03/2022 16:38, Zhang Chen wrote:
>>> We notice the QEMU may crash when the guest has too many incoming
>>> network connections with the following log:
>>>
>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
>>> hashtable full, clear it
>>> free(): invalid pointer
>>> [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
>>>
>>> This is because we create the s->connection_track_table with
>>> g_hash_table_new_full() which is defined as:
>>>
>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
>>>                          GEqualFunc key_equal_func,
>>>                          GDestroyNotify key_destroy_func,
>>>                          GDestroyNotify value_destroy_func);
>>>
>>> The fourth parameter connection_destroy() will be called to free the
>>> memory allocated for all 'Connection' values in the hashtable when we
>>> call g_hash_table_remove_all() in the connection_hashtable_reset().
>>>
>>> It's unnecessary because we clear the conn_list explicitly later, and
>>> it's buggy when other agents try to call connection_get() with the
>>> same connection_track_table.
>>>
>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>> ---
>>>    net/colo-compare.c    | 2 +-
>>>    net/filter-rewriter.c | 2 +-
>>>    2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
>>> 62554b5b3c..ab054cfd21 100644
>>> --- a/net/colo-compare.c
>>> +++ b/net/colo-compare.c
>>> @@ -1324,7 +1324,7 @@ static void
>> colo_compare_complete(UserCreatable *uc, Error **errp)
>>>        s->connection_track_table =
>> g_hash_table_new_full(connection_key_hash,
>>>                                                          connection_key_equal,
>>>                                                          g_free,
>>> -                                                      connection_destroy);
>>> +                                                      NULL);
>>
>> 202 /* if not found, create a new connection and add to hash table */
>> 203 Connection *connection_get(GHashTable *connection_track_table,
>> 204                            ConnectionKey *key,
>> 205                            GQueue *conn_list)
>> 206 {
>> 207     Connection *conn = g_hash_table_lookup(connection_track_table,
>> key);
>> 208
>> 209     if (conn == NULL) {
>> 210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
>> 211
>> 212         conn = connection_new(key);
>> 213
>> 214         if (g_hash_table_size(connection_track_table) >
>> HASHTABLE_MAX_SIZE) {
>> 215             trace_colo_proxy_main("colo proxy connection hashtable full,"
>> 216                                   " clear it");
>> 217 connection_hashtable_reset(connection_track_table);
>>
>> 197 void connection_hashtable_reset(GHashTable *connection_track_table)
>> 198 {
>> 199 g_hash_table_remove_all(connection_track_table);
>> 200 }
>>
>> IIUC,  above subroutine will do some cleanup explicitly. And before your
>> patch, connection_hashtable_reset() will release all keys and their values in
>> this hashtable. But now, you remove all keys and just one value(conn_list)
>> instead. Does it means other values will be leaked ?
> Thanks Zhijian. Re-think about it. Yes, I think you are right.
> It looks need a lock to avoid into connection_get() when triggered the clear hashtable operation.
> What do you think?
>
> Thanks
> Chen
>
>
>>
>> 218 /*
>> 219              * clear the conn_list
>> 220 */
>> 221             while (!g_queue_is_empty(conn_list)) {
>> 222 connection_destroy(g_queue_pop_head(conn_list));
>> 223 }
>> 224 }
>> 225
>> 226         g_hash_table_insert(connection_track_table, new_key, conn);
>> 227 }
>> 228
>> 229     return conn;
>> 230 }
>>
>>
>> Thanks
>> Zhijian
>>
>>>        colo_compare_iothread(s);
>>>
>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
>>> bf05023dc3..c18c4c2019 100644
>>> --- a/net/filter-rewriter.c
>>> +++ b/net/filter-rewriter.c
>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf,
>> Error **errp)
>>>        s->connection_track_table =
>> g_hash_table_new_full(connection_key_hash,
>>>                                                          connection_key_equal,
>>>                                                          g_free,
>>> -                                                      connection_destroy);
>>> +                                                      NULL);
>>>        s->incoming_queue =
>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
>>>    }
>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
  2022-03-31  1:14       ` lizhijian
@ 2022-03-31  2:25         ` Zhang, Chen
  2022-04-01  1:46           ` lizhijian
  0 siblings, 1 reply; 13+ messages in thread
From: Zhang, Chen @ 2022-03-31  2:25 UTC (permalink / raw)
  To: lizhijian@fujitsu.com, Jason Wang; +Cc: qemu-dev, Like Xu



> -----Original Message-----
> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> Sent: Thursday, March 31, 2022 9:15 AM
> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> <jasowang@redhat.com>
> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
> conn_list
> 
> 
> connection_track_table
> -----+----------
> key1 | conn    |-----------------------------------------------------------+
> -----+----------                                                           |
> key2 | conn    |----------------------------------+                        |
> -----+----------                                  |                        |
> key3 | conn    |---------+                        |                        |
> -----+----------         |                        |                        |
>                           |                        |                        |
>                           |                        |                        |
>      + CompareState ++    |                        |                        |
>      |               |    V                        V                        V
>      +---------------+   +---------------+         +---------------+
>      |conn_list      +--->conn           +--------->conn           |     connx
>      +---------------+   +---------------+         +---------------+
>      |               |     |           |             |          |
>      +---------------+ +---v----+  +---v----+    +---v----+ +---v----+
>                        |primary |  |secondary    |primary | |secondary
>                        |packet  |  |packet  +    |packet  | |packet  +
>                        +--------+  +--------+    +--------+ +--------+
>                            |           |             |          |
>                        +---v----+  +---v----+    +---v----+ +---v----+
>                        |primary |  |secondary    |primary | |secondary
>                        |packet  |  |packet  +    |packet  | |packet  +
>                        +--------+  +--------+    +--------+ +--------+
>                            |           |             |          |
>                        +---v----+  +---v----+    +---v----+ +---v----+
>                        |primary |  |secondary    |primary | |secondary
>                        |packet  |  |packet  +    |packet  | |packet  +
>                        +--------+  +--------+    +--------+ +--------+
> 
> I recalled that we should above relationships between
> connection_track_table conn_list and conn.
> That means both connection_track_table and conn_list reference to the
> same conn instance.
> 
> So before this patch, connection_get() is possible to use-after-free/double
> free conn. where 1st was in
> connection_hashtable_reset() and 2nd was
> 221             while (!g_queue_is_empty(conn_list)) {
> 222                 connection_destroy(g_queue_pop_head(conn_list));
> 223             }
> 
> I also doubt that your current abort was just due to above use-after-
> free/double free.
> If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd
> place.

Make sense, but It also means the original patch works here, skip free conn in connection_hashtable_reset() and do it in:
221             while (!g_queue_is_empty(conn_list)) {
 222                 connection_destroy(g_queue_pop_head(conn_list));
 223             }.
It also avoid use-after-free/double free conn.
Maybe we can keep the original version to fix it?

Thanks
Chen

> 
> Thanks
> Zhijian
> 
> 
> On 28/03/2022 17:13, Zhang, Chen wrote:
> >
> >> -----Original Message-----
> >> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> >> Sent: Monday, March 21, 2022 11:06 AM
> >> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> >> <jasowang@redhat.com>; lizhijian@fujitsu.com
> >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
> >> <like.xu@linux.intel.com>
> >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear
> >> the conn_list
> >>
> >>
> >>
> >> On 09/03/2022 16:38, Zhang Chen wrote:
> >>> We notice the QEMU may crash when the guest has too many incoming
> >>> network connections with the following log:
> >>>
> >>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
> >>> hashtable full, clear it
> >>> free(): invalid pointer
> >>> [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
> >>>
> >>> This is because we create the s->connection_track_table with
> >>> g_hash_table_new_full() which is defined as:
> >>>
> >>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
> >>>                          GEqualFunc key_equal_func,
> >>>                          GDestroyNotify key_destroy_func,
> >>>                          GDestroyNotify value_destroy_func);
> >>>
> >>> The fourth parameter connection_destroy() will be called to free the
> >>> memory allocated for all 'Connection' values in the hashtable when
> >>> we call g_hash_table_remove_all() in the connection_hashtable_reset().
> >>>
> >>> It's unnecessary because we clear the conn_list explicitly later,
> >>> and it's buggy when other agents try to call connection_get() with
> >>> the same connection_track_table.
> >>>
> >>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> >>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> >>> ---
> >>>    net/colo-compare.c    | 2 +-
> >>>    net/filter-rewriter.c | 2 +-
> >>>    2 files changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
> >>> 62554b5b3c..ab054cfd21 100644
> >>> --- a/net/colo-compare.c
> >>> +++ b/net/colo-compare.c
> >>> @@ -1324,7 +1324,7 @@ static void
> >> colo_compare_complete(UserCreatable *uc, Error **errp)
> >>>        s->connection_track_table =
> >> g_hash_table_new_full(connection_key_hash,
> >>>                                                          connection_key_equal,
> >>>                                                          g_free,
> >>> -                                                      connection_destroy);
> >>> +                                                      NULL);
> >>
> >> 202 /* if not found, create a new connection and add to hash table */
> >> 203 Connection *connection_get(GHashTable *connection_track_table,
> >> 204                            ConnectionKey *key,
> >> 205                            GQueue *conn_list)
> >> 206 {
> >> 207     Connection *conn =
> >> g_hash_table_lookup(connection_track_table,
> >> key);
> >> 208
> >> 209     if (conn == NULL) {
> >> 210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
> >> 211
> >> 212         conn = connection_new(key);
> >> 213
> >> 214         if (g_hash_table_size(connection_track_table) >
> >> HASHTABLE_MAX_SIZE) {
> >> 215             trace_colo_proxy_main("colo proxy connection hashtable full,"
> >> 216                                   " clear it");
> >> 217 connection_hashtable_reset(connection_track_table);
> >>
> >> 197 void connection_hashtable_reset(GHashTable
> >> *connection_track_table)
> >> 198 {
> >> 199 g_hash_table_remove_all(connection_track_table);
> >> 200 }
> >>
> >> IIUC,  above subroutine will do some cleanup explicitly. And before
> >> your patch, connection_hashtable_reset() will release all keys and
> >> their values in this hashtable. But now, you remove all keys and just
> >> one value(conn_list) instead. Does it means other values will be leaked ?
> > Thanks Zhijian. Re-think about it. Yes, I think you are right.
> > It looks need a lock to avoid into connection_get() when triggered the clear
> hashtable operation.
> > What do you think?
> >
> > Thanks
> > Chen
> >
> >
> >>
> >> 218 /*
> >> 219              * clear the conn_list
> >> 220 */
> >> 221             while (!g_queue_is_empty(conn_list)) {
> >> 222 connection_destroy(g_queue_pop_head(conn_list));
> >> 223 }
> >> 224 }
> >> 225
> >> 226         g_hash_table_insert(connection_track_table, new_key,
> >> conn);
> >> 227 }
> >> 228
> >> 229     return conn;
> >> 230 }
> >>
> >>
> >> Thanks
> >> Zhijian
> >>
> >>>        colo_compare_iothread(s);
> >>>
> >>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
> >>> bf05023dc3..c18c4c2019 100644
> >>> --- a/net/filter-rewriter.c
> >>> +++ b/net/filter-rewriter.c
> >>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState
> >>> *nf,
> >> Error **errp)
> >>>        s->connection_track_table =
> >> g_hash_table_new_full(connection_key_hash,
> >>>                                                          connection_key_equal,
> >>>                                                          g_free,
> >>> -                                                      connection_destroy);
> >>> +                                                      NULL);
> >>>        s->incoming_queue =
> >> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
> >>>    }
> >>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
  2022-03-31  2:25         ` Zhang, Chen
@ 2022-04-01  1:46           ` lizhijian
  2022-04-01  3:33             ` Zhang, Chen
  0 siblings, 1 reply; 13+ messages in thread
From: lizhijian @ 2022-04-01  1:46 UTC (permalink / raw)
  To: Zhang, Chen, Jason Wang; +Cc: qemu-dev, Like Xu



On 31/03/2022 10:25, Zhang, Chen wrote:
>
>> -----Original Message-----
>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
>> Sent: Thursday, March 31, 2022 9:15 AM
>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
>> <jasowang@redhat.com>
>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
>> conn_list
>>
>>
>> connection_track_table
>> -----+----------
>> key1 | conn    |-----------------------------------------------------------+
>> -----+----------                                                           |
>> key2 | conn    |----------------------------------+                        |
>> -----+----------                                  |                        |
>> key3 | conn    |---------+                        |                        |
>> -----+----------         |                        |                        |
>>                            |                        |                        |
>>                            |                        |                        |
>>       + CompareState ++    |                        |                        |
>>       |               |    V                        V                        V
>>       +---------------+   +---------------+         +---------------+
>>       |conn_list      +--->conn           +--------->conn           |     connx
>>       +---------------+   +---------------+         +---------------+
>>       |               |     |           |             |          |
>>       +---------------+ +---v----+  +---v----+    +---v----+ +---v----+
>>                         |primary |  |secondary    |primary | |secondary
>>                         |packet  |  |packet  +    |packet  | |packet  +
>>                         +--------+  +--------+    +--------+ +--------+
>>                             |           |             |          |
>>                         +---v----+  +---v----+    +---v----+ +---v----+
>>                         |primary |  |secondary    |primary | |secondary
>>                         |packet  |  |packet  +    |packet  | |packet  +
>>                         +--------+  +--------+    +--------+ +--------+
>>                             |           |             |          |
>>                         +---v----+  +---v----+    +---v----+ +---v----+
>>                         |primary |  |secondary    |primary | |secondary
>>                         |packet  |  |packet  +    |packet  | |packet  +
>>                         +--------+  +--------+    +--------+ +--------+
>>
>> I recalled that we should above relationships between
>> connection_track_table conn_list and conn.
>> That means both connection_track_table and conn_list reference to the
>> same conn instance.
>>
>> So before this patch, connection_get() is possible to use-after-free/double
>> free conn. where 1st was in
>> connection_hashtable_reset() and 2nd was
>> 221             while (!g_queue_is_empty(conn_list)) {
>> 222                 connection_destroy(g_queue_pop_head(conn_list));
>> 223             }
>>
>> I also doubt that your current abort was just due to above use-after-
>> free/double free.
>> If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd
>> place.
> Make sense, but It also means the original patch works here, skip free conn in connection_hashtable_reset() and do it in:
> 221             while (!g_queue_is_empty(conn_list)) {
>   222                 connection_destroy(g_queue_pop_head(conn_list));
>   223             }.
> It also avoid use-after-free/double free conn.
Although you will not use-after-free here, you have to consider other situations carefully that
g_hash_table_remove_all() g_hash_table_destroy() were called where the conn_list should also be freed
with you approach.




> Maybe we can keep the original version to fix it?
And your commit log should be more clear.

Thanks
Zhijian

>
> Thanks
> Chen
>
>> Thanks
>> Zhijian
>>
>>
>> On 28/03/2022 17:13, Zhang, Chen wrote:
>>>> -----Original Message-----
>>>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
>>>> Sent: Monday, March 21, 2022 11:06 AM
>>>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
>>>> <jasowang@redhat.com>; lizhijian@fujitsu.com
>>>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
>>>> <like.xu@linux.intel.com>
>>>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear
>>>> the conn_list
>>>>
>>>>
>>>>
>>>> On 09/03/2022 16:38, Zhang Chen wrote:
>>>>> We notice the QEMU may crash when the guest has too many incoming
>>>>> network connections with the following log:
>>>>>
>>>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
>>>>> hashtable full, clear it
>>>>> free(): invalid pointer
>>>>> [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
>>>>>
>>>>> This is because we create the s->connection_track_table with
>>>>> g_hash_table_new_full() which is defined as:
>>>>>
>>>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
>>>>>                           GEqualFunc key_equal_func,
>>>>>                           GDestroyNotify key_destroy_func,
>>>>>                           GDestroyNotify value_destroy_func);
>>>>>
>>>>> The fourth parameter connection_destroy() will be called to free the
>>>>> memory allocated for all 'Connection' values in the hashtable when
>>>>> we call g_hash_table_remove_all() in the connection_hashtable_reset().
>>>>>
>>>>> It's unnecessary because we clear the conn_list explicitly later,
>>>>> and it's buggy when other agents try to call connection_get() with
>>>>> the same connection_track_table.
>>>>>
>>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>>>> ---
>>>>>     net/colo-compare.c    | 2 +-
>>>>>     net/filter-rewriter.c | 2 +-
>>>>>     2 files changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
>>>>> 62554b5b3c..ab054cfd21 100644
>>>>> --- a/net/colo-compare.c
>>>>> +++ b/net/colo-compare.c
>>>>> @@ -1324,7 +1324,7 @@ static void
>>>> colo_compare_complete(UserCreatable *uc, Error **errp)
>>>>>         s->connection_track_table =
>>>> g_hash_table_new_full(connection_key_hash,
>>>>>                                                           connection_key_equal,
>>>>>                                                           g_free,
>>>>> -                                                      connection_destroy);
>>>>> +                                                      NULL);
>>>> 202 /* if not found, create a new connection and add to hash table */
>>>> 203 Connection *connection_get(GHashTable *connection_track_table,
>>>> 204                            ConnectionKey *key,
>>>> 205                            GQueue *conn_list)
>>>> 206 {
>>>> 207     Connection *conn =
>>>> g_hash_table_lookup(connection_track_table,
>>>> key);
>>>> 208
>>>> 209     if (conn == NULL) {
>>>> 210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
>>>> 211
>>>> 212         conn = connection_new(key);
>>>> 213
>>>> 214         if (g_hash_table_size(connection_track_table) >
>>>> HASHTABLE_MAX_SIZE) {
>>>> 215             trace_colo_proxy_main("colo proxy connection hashtable full,"
>>>> 216                                   " clear it");
>>>> 217 connection_hashtable_reset(connection_track_table);
>>>>
>>>> 197 void connection_hashtable_reset(GHashTable
>>>> *connection_track_table)
>>>> 198 {
>>>> 199 g_hash_table_remove_all(connection_track_table);
>>>> 200 }
>>>>
>>>> IIUC,  above subroutine will do some cleanup explicitly. And before
>>>> your patch, connection_hashtable_reset() will release all keys and
>>>> their values in this hashtable. But now, you remove all keys and just
>>>> one value(conn_list) instead. Does it means other values will be leaked ?
>>> Thanks Zhijian. Re-think about it. Yes, I think you are right.
>>> It looks need a lock to avoid into connection_get() when triggered the clear
>> hashtable operation.
>>> What do you think?
>>>
>>> Thanks
>>> Chen
>>>
>>>
>>>> 218 /*
>>>> 219              * clear the conn_list
>>>> 220 */
>>>> 221             while (!g_queue_is_empty(conn_list)) {
>>>> 222 connection_destroy(g_queue_pop_head(conn_list));
>>>> 223 }
>>>> 224 }
>>>> 225
>>>> 226         g_hash_table_insert(connection_track_table, new_key,
>>>> conn);
>>>> 227 }
>>>> 228
>>>> 229     return conn;
>>>> 230 }
>>>>
>>>>
>>>> Thanks
>>>> Zhijian
>>>>
>>>>>         colo_compare_iothread(s);
>>>>>
>>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
>>>>> bf05023dc3..c18c4c2019 100644
>>>>> --- a/net/filter-rewriter.c
>>>>> +++ b/net/filter-rewriter.c
>>>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState
>>>>> *nf,
>>>> Error **errp)
>>>>>         s->connection_track_table =
>>>> g_hash_table_new_full(connection_key_hash,
>>>>>                                                           connection_key_equal,
>>>>>                                                           g_free,
>>>>> -                                                      connection_destroy);
>>>>> +                                                      NULL);
>>>>>         s->incoming_queue =
>>>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
>>>>>     }
>>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
  2022-04-01  1:46           ` lizhijian
@ 2022-04-01  3:33             ` Zhang, Chen
  0 siblings, 0 replies; 13+ messages in thread
From: Zhang, Chen @ 2022-04-01  3:33 UTC (permalink / raw)
  To: lizhijian@fujitsu.com, Jason Wang; +Cc: qemu-dev, Like Xu



> -----Original Message-----
> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> Sent: Friday, April 1, 2022 9:47 AM
> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> <jasowang@redhat.com>
> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
> conn_list
> 
> 
> 
> On 31/03/2022 10:25, Zhang, Chen wrote:
> >
> >> -----Original Message-----
> >> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> >> Sent: Thursday, March 31, 2022 9:15 AM
> >> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> >> <jasowang@redhat.com>
> >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
> >> <like.xu@linux.intel.com>
> >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear
> >> the conn_list
> >>
> >>
> >> connection_track_table
> >> -----+----------
> >> key1 | conn    |-----------------------------------------------------------+
> >> -----+----------                                                           |
> >> key2 | conn    |----------------------------------+                        |
> >> -----+----------                                  |                        |
> >> key3 | conn    |---------+                        |                        |
> >> -----+----------         |                        |                        |
> >>                            |                        |                        |
> >>                            |                        |                        |
> >>       + CompareState ++    |                        |                        |
> >>       |               |    V                        V                        V
> >>       +---------------+   +---------------+         +---------------+
> >>       |conn_list      +--->conn           +--------->conn           |     connx
> >>       +---------------+   +---------------+         +---------------+
> >>       |               |     |           |             |          |
> >>       +---------------+ +---v----+  +---v----+    +---v----+ +---v----+
> >>                         |primary |  |secondary    |primary | |secondary
> >>                         |packet  |  |packet  +    |packet  | |packet  +
> >>                         +--------+  +--------+    +--------+ +--------+
> >>                             |           |             |          |
> >>                         +---v----+  +---v----+    +---v----+ +---v----+
> >>                         |primary |  |secondary    |primary | |secondary
> >>                         |packet  |  |packet  +    |packet  | |packet  +
> >>                         +--------+  +--------+    +--------+ +--------+
> >>                             |           |             |          |
> >>                         +---v----+  +---v----+    +---v----+ +---v----+
> >>                         |primary |  |secondary    |primary | |secondary
> >>                         |packet  |  |packet  +    |packet  | |packet  +
> >>                         +--------+  +--------+    +--------+ +--------+
> >>
> >> I recalled that we should above relationships between
> >> connection_track_table conn_list and conn.
> >> That means both connection_track_table and conn_list reference to the
> >> same conn instance.
> >>
> >> So before this patch, connection_get() is possible to
> >> use-after-free/double free conn. where 1st was in
> >> connection_hashtable_reset() and 2nd was
> >> 221             while (!g_queue_is_empty(conn_list)) {
> >> 222                 connection_destroy(g_queue_pop_head(conn_list));
> >> 223             }
> >>
> >> I also doubt that your current abort was just due to above use-after-
> >> free/double free.
> >> If so, looks it's enough we just update to g_queue_clear(conn_list)
> >> in the 2nd place.
> > Make sense, but It also means the original patch works here, skip free conn
> in connection_hashtable_reset() and do it in:
> > 221             while (!g_queue_is_empty(conn_list)) {
> >   222                 connection_destroy(g_queue_pop_head(conn_list));
> >   223             }.
> > It also avoid use-after-free/double free conn.
> Although you will not use-after-free here, you have to consider other
> situations carefully that
> g_hash_table_remove_all() g_hash_table_destroy() were called where the
> conn_list should also be freed with you approach.
> 
> 

I re-checked the code, it looks fine to me.

> 
> 
> > Maybe we can keep the original version to fix it?
> And your commit log should be more clear.

OK, I will update V2 for commit log.

Thanks
Chen 

> 
> Thanks
> Zhijian
> 
> >
> > Thanks
> > Chen
> >
> >> Thanks
> >> Zhijian
> >>
> >>
> >> On 28/03/2022 17:13, Zhang, Chen wrote:
> >>>> -----Original Message-----
> >>>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> >>>> Sent: Monday, March 21, 2022 11:06 AM
> >>>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> >>>> <jasowang@redhat.com>; lizhijian@fujitsu.com
> >>>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
> >>>> <like.xu@linux.intel.com>
> >>>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to
> >>>> clear the conn_list
> >>>>
> >>>>
> >>>>
> >>>> On 09/03/2022 16:38, Zhang Chen wrote:
> >>>>> We notice the QEMU may crash when the guest has too many
> incoming
> >>>>> network connections with the following log:
> >>>>>
> >>>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
> >>>>> hashtable full, clear it
> >>>>> free(): invalid pointer
> >>>>> [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
> >>>>>
> >>>>> This is because we create the s->connection_track_table with
> >>>>> g_hash_table_new_full() which is defined as:
> >>>>>
> >>>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
> >>>>>                           GEqualFunc key_equal_func,
> >>>>>                           GDestroyNotify key_destroy_func,
> >>>>>                           GDestroyNotify value_destroy_func);
> >>>>>
> >>>>> The fourth parameter connection_destroy() will be called to free
> >>>>> the memory allocated for all 'Connection' values in the hashtable
> >>>>> when we call g_hash_table_remove_all() in the
> connection_hashtable_reset().
> >>>>>
> >>>>> It's unnecessary because we clear the conn_list explicitly later,
> >>>>> and it's buggy when other agents try to call connection_get() with
> >>>>> the same connection_track_table.
> >>>>>
> >>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> >>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> >>>>> ---
> >>>>>     net/colo-compare.c    | 2 +-
> >>>>>     net/filter-rewriter.c | 2 +-
> >>>>>     2 files changed, 2 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
> >>>>> 62554b5b3c..ab054cfd21 100644
> >>>>> --- a/net/colo-compare.c
> >>>>> +++ b/net/colo-compare.c
> >>>>> @@ -1324,7 +1324,7 @@ static void
> >>>> colo_compare_complete(UserCreatable *uc, Error **errp)
> >>>>>         s->connection_track_table =
> >>>> g_hash_table_new_full(connection_key_hash,
> >>>>>                                                           connection_key_equal,
> >>>>>                                                           g_free,
> >>>>> -                                                      connection_destroy);
> >>>>> +                                                      NULL);
> >>>> 202 /* if not found, create a new connection and add to hash table
> >>>> */
> >>>> 203 Connection *connection_get(GHashTable *connection_track_table,
> >>>> 204                            ConnectionKey *key,
> >>>> 205                            GQueue *conn_list)
> >>>> 206 {
> >>>> 207     Connection *conn =
> >>>> g_hash_table_lookup(connection_track_table,
> >>>> key);
> >>>> 208
> >>>> 209     if (conn == NULL) {
> >>>> 210         ConnectionKey *new_key = g_memdup(key, sizeof(*key));
> >>>> 211
> >>>> 212         conn = connection_new(key);
> >>>> 213
> >>>> 214         if (g_hash_table_size(connection_track_table) >
> >>>> HASHTABLE_MAX_SIZE) {
> >>>> 215             trace_colo_proxy_main("colo proxy connection hashtable
> full,"
> >>>> 216                                   " clear it");
> >>>> 217 connection_hashtable_reset(connection_track_table);
> >>>>
> >>>> 197 void connection_hashtable_reset(GHashTable
> >>>> *connection_track_table)
> >>>> 198 {
> >>>> 199 g_hash_table_remove_all(connection_track_table);
> >>>> 200 }
> >>>>
> >>>> IIUC,  above subroutine will do some cleanup explicitly. And before
> >>>> your patch, connection_hashtable_reset() will release all keys and
> >>>> their values in this hashtable. But now, you remove all keys and
> >>>> just one value(conn_list) instead. Does it means other values will be
> leaked ?
> >>> Thanks Zhijian. Re-think about it. Yes, I think you are right.
> >>> It looks need a lock to avoid into connection_get() when triggered
> >>> the clear
> >> hashtable operation.
> >>> What do you think?
> >>>
> >>> Thanks
> >>> Chen
> >>>
> >>>
> >>>> 218 /*
> >>>> 219              * clear the conn_list
> >>>> 220 */
> >>>> 221             while (!g_queue_is_empty(conn_list)) {
> >>>> 222 connection_destroy(g_queue_pop_head(conn_list));
> >>>> 223 }
> >>>> 224 }
> >>>> 225
> >>>> 226         g_hash_table_insert(connection_track_table, new_key,
> >>>> conn);
> >>>> 227 }
> >>>> 228
> >>>> 229     return conn;
> >>>> 230 }
> >>>>
> >>>>
> >>>> Thanks
> >>>> Zhijian
> >>>>
> >>>>>         colo_compare_iothread(s);
> >>>>>
> >>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
> >>>>> bf05023dc3..c18c4c2019 100644
> >>>>> --- a/net/filter-rewriter.c
> >>>>> +++ b/net/filter-rewriter.c
> >>>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState
> >>>>> *nf,
> >>>> Error **errp)
> >>>>>         s->connection_track_table =
> >>>> g_hash_table_new_full(connection_key_hash,
> >>>>>                                                           connection_key_equal,
> >>>>>                                                           g_free,
> >>>>> -                                                      connection_destroy);
> >>>>> +                                                      NULL);
> >>>>>         s->incoming_queue =
> >>>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
> >>>>>     }
> >>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter
  2022-03-09  8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen
  2022-03-09  8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen
  2022-03-09  8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen
@ 2022-03-09  8:38 ` Zhang Chen
  2022-03-21  3:16   ` lizhijian
  2022-03-09  8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen
  3 siblings, 1 reply; 13+ messages in thread
From: Zhang Chen @ 2022-03-09  8:38 UTC (permalink / raw)
  To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev

Filter-rewriter no need to track connection in conn_list.
This patch fix the glib g_queue_is_empty assertion when COLO guest
keep a lot of network connection.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 net/colo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/colo.c b/net/colo.c
index 1f8162f59f..694f3c93ef 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -218,7 +218,7 @@ Connection *connection_get(GHashTable *connection_track_table,
             /*
              * clear the conn_list
              */
-            while (!g_queue_is_empty(conn_list)) {
+            while (conn_list && !g_queue_is_empty(conn_list)) {
                 connection_destroy(g_queue_pop_head(conn_list));
             }
         }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter
  2022-03-09  8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen
@ 2022-03-21  3:16   ` lizhijian
  0 siblings, 0 replies; 13+ messages in thread
From: lizhijian @ 2022-03-21  3:16 UTC (permalink / raw)
  To: Zhang Chen, Jason Wang, lizhijian@fujitsu.com; +Cc: qemu-dev



On 09/03/2022 16:38, Zhang Chen wrote:
> Filter-rewriter no need to track connection in conn_list.
> This patch fix the glib g_queue_is_empty assertion when COLO guest
> keep a lot of network connection.
>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
LGTM.

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   net/colo.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/colo.c b/net/colo.c
> index 1f8162f59f..694f3c93ef 100644
> --- a/net/colo.c
> +++ b/net/colo.c
> @@ -218,7 +218,7 @@ Connection *connection_get(GHashTable *connection_track_table,
>               /*
>                * clear the conn_list
>                */
> -            while (!g_queue_is_empty(conn_list)) {
> +            while (conn_list && !g_queue_is_empty(conn_list)) {
>                   connection_destroy(g_queue_pop_head(conn_list));
>               }
>           }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly
  2022-03-09  8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen
                   ` (2 preceding siblings ...)
  2022-03-09  8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen
@ 2022-03-09  8:38 ` Zhang Chen
  2022-03-21  3:39   ` lizhijian
  3 siblings, 1 reply; 13+ messages in thread
From: Zhang Chen @ 2022-03-09  8:38 UTC (permalink / raw)
  To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, Tao Xu, qemu-dev

When COLO use only one vnet_hdr_support parameter between
filter-redirector and filter-mirror(or colo-compare), COLO will crash
with segmentation fault. Back track as follow:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
    at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
(gdb) bt
0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
    at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
net/colo.c:49
2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
net/filter-rewriter.c:63

So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
raise error and add trace-events to track vnet_hdr_len.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 net/colo.c       | 9 ++++++++-
 net/trace-events | 1 +
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/colo.c b/net/colo.c
index 694f3c93ef..6b0ff562ad 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt)
     static const uint8_t vlan[] = {0x81, 0x00};
     uint8_t *data = pkt->data + pkt->vnet_hdr_len;
     uint16_t l3_proto;
-    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
+    ssize_t l2hdr_len;
+
+    if (data == NULL) {
+        trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
+                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
+        return 1;
+    }
+    l2hdr_len = eth_get_l2_hdr_length(data);
 
     if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
         trace_colo_proxy_main("pkt->size < ETH_HLEN");
diff --git a/net/trace-events b/net/trace-events
index d7a17256cc..6af927b4b9 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -9,6 +9,7 @@ vhost_user_event(const char *chr, int event) "chr: %s got event: %d"
 
 # colo.c
 colo_proxy_main(const char *chr) ": %s"
+colo_proxy_main_vnet_info(const char *sta, int size) ": %s = %d"
 
 # colo-compare.c
 colo_compare_main(const char *chr) ": %s"
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly
  2022-03-09  8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen
@ 2022-03-21  3:39   ` lizhijian
  0 siblings, 0 replies; 13+ messages in thread
From: lizhijian @ 2022-03-21  3:39 UTC (permalink / raw)
  To: Zhang Chen, Jason Wang, lizhijian@fujitsu.com; +Cc: Tao Xu, qemu-dev



On 09/03/2022 16:38, Zhang Chen wrote:
> When COLO use only one vnet_hdr_support parameter between
> filter-redirector and filter-mirror(or colo-compare), COLO will crash
> with segmentation fault. Back track as follow:
>
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
>      at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> 296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
> (gdb) bt
> 0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
>      at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> 1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
> net/colo.c:49
> 2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
> net/filter-rewriter.c:63
>
> So wrong vnet_hdr_len will cause pkt->data become NULL.
Not sure if we can check this earlier, well

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> Add check to
> raise error and add trace-events to track vnet_hdr_len.
>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>


> ---
>   net/colo.c       | 9 ++++++++-
>   net/trace-events | 1 +
>   2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/colo.c b/net/colo.c
> index 694f3c93ef..6b0ff562ad 100644
> --- a/net/colo.c
> +++ b/net/colo.c
> @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt)
>       static const uint8_t vlan[] = {0x81, 0x00};
>       uint8_t *data = pkt->data + pkt->vnet_hdr_len;
>       uint16_t l3_proto;
> -    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
> +    ssize_t l2hdr_len;
> +
> +    if (data == NULL) {
> +        trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
> +                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
> +        return 1;
> +    }
> +    l2hdr_len = eth_get_l2_hdr_length(data);
>   
>       if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
>           trace_colo_proxy_main("pkt->size < ETH_HLEN");
> diff --git a/net/trace-events b/net/trace-events
> index d7a17256cc..6af927b4b9 100644
> --- a/net/trace-events
> +++ b/net/trace-events
> @@ -9,6 +9,7 @@ vhost_user_event(const char *chr, int event) "chr: %s got event: %d"
>   
>   # colo.c
>   colo_proxy_main(const char *chr) ": %s"
> +colo_proxy_main_vnet_info(const char *sta, int size) ": %s = %d"
>   
>   # colo-compare.c
>   colo_compare_main(const char *chr) ": %s"

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-04-01  3:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-09  8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen
2022-03-09  8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen
2022-03-09  8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen
2022-03-21  3:05   ` lizhijian
2022-03-28  9:13     ` Zhang, Chen
2022-03-31  1:14       ` lizhijian
2022-03-31  2:25         ` Zhang, Chen
2022-04-01  1:46           ` lizhijian
2022-04-01  3:33             ` Zhang, Chen
2022-03-09  8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen
2022-03-21  3:16   ` lizhijian
2022-03-09  8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen
2022-03-21  3:39   ` lizhijian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).