* [PATCH 0/4] COLO net and runstate bugfix/optimization @ 2022-03-09 8:38 Zhang Chen 2022-03-09 8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Zhang Chen @ 2022-03-09 8:38 UTC (permalink / raw) To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev This series fix some COLO related issues in internal stress testing. Zhang Chen (4): softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH net/colo: Fix a "double free" crash to clear the conn_list net/colo.c: No need to track conn_list for filter-rewriter net/colo.c: fix segmentation fault when packet is not parsed correctly net/colo-compare.c | 2 +- net/colo.c | 11 +++++++++-- net/filter-rewriter.c | 2 +- net/trace-events | 1 + softmmu/runstate.c | 1 + 5 files changed, 13 insertions(+), 4 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH 2022-03-09 8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen @ 2022-03-09 8:38 ` Zhang Chen 2022-03-09 8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Zhang Chen @ 2022-03-09 8:38 UTC (permalink / raw) To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev, Like Xu If the checkpoint occurs when the guest finishes restarting but has not started running, the runstate_set() may reject the transition from COLO to PRELAUNCH with the crash log: {"timestamp": {"seconds": 1593484591, "microseconds": 26605},\ "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}} qemu-system-x86_64: invalid runstate transition: 'colo' -> 'prelaunch' Long-term testing says that it's pretty safe. Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> --- softmmu/runstate.c | 1 + 1 file changed, 1 insertion(+) diff --git a/softmmu/runstate.c b/softmmu/runstate.c index e0d869b21a..c021c56338 100644 --- a/softmmu/runstate.c +++ b/softmmu/runstate.c @@ -127,6 +127,7 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH }, { RUN_STATE_COLO, RUN_STATE_RUNNING }, + { RUN_STATE_COLO, RUN_STATE_PRELAUNCH }, { RUN_STATE_COLO, RUN_STATE_SHUTDOWN}, { RUN_STATE_RUNNING, RUN_STATE_DEBUG }, -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list 2022-03-09 8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen 2022-03-09 8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen @ 2022-03-09 8:38 ` Zhang Chen 2022-03-21 3:05 ` lizhijian 2022-03-09 8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen 2022-03-09 8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen 3 siblings, 1 reply; 13+ messages in thread From: Zhang Chen @ 2022-03-09 8:38 UTC (permalink / raw) To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev, Like Xu We notice the QEMU may crash when the guest has too many incoming network connections with the following log: 15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it free(): invalid pointer [1] 15195 abort (core dumped) qemu-system-x86_64 .... This is because we create the s->connection_track_table with g_hash_table_new_full() which is defined as: GHashTable * g_hash_table_new_full (GHashFunc hash_func, GEqualFunc key_equal_func, GDestroyNotify key_destroy_func, GDestroyNotify value_destroy_func); The fourth parameter connection_destroy() will be called to free the memory allocated for all 'Connection' values in the hashtable when we call g_hash_table_remove_all() in the connection_hashtable_reset(). It's unnecessary because we clear the conn_list explicitly later, and it's buggy when other agents try to call connection_get() with the same connection_track_table. Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> --- net/colo-compare.c | 2 +- net/filter-rewriter.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 62554b5b3c..ab054cfd21 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -1324,7 +1324,7 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp) s->connection_track_table = g_hash_table_new_full(connection_key_hash, connection_key_equal, g_free, - connection_destroy); + NULL); colo_compare_iothread(s); diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index bf05023dc3..c18c4c2019 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp) s->connection_track_table = g_hash_table_new_full(connection_key_hash, connection_key_equal, g_free, - connection_destroy); + NULL); s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf); } -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list 2022-03-09 8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen @ 2022-03-21 3:05 ` lizhijian 2022-03-28 9:13 ` Zhang, Chen 0 siblings, 1 reply; 13+ messages in thread From: lizhijian @ 2022-03-21 3:05 UTC (permalink / raw) To: Zhang Chen, Jason Wang, lizhijian@fujitsu.com; +Cc: qemu-dev, Like Xu On 09/03/2022 16:38, Zhang Chen wrote: > We notice the QEMU may crash when the guest has too many > incoming network connections with the following log: > > 15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it > free(): invalid pointer > [1] 15195 abort (core dumped) qemu-system-x86_64 .... > > This is because we create the s->connection_track_table with > g_hash_table_new_full() which is defined as: > > GHashTable * g_hash_table_new_full (GHashFunc hash_func, > GEqualFunc key_equal_func, > GDestroyNotify key_destroy_func, > GDestroyNotify value_destroy_func); > > The fourth parameter connection_destroy() will be called to free the > memory allocated for all 'Connection' values in the hashtable when > we call g_hash_table_remove_all() in the connection_hashtable_reset(). > > It's unnecessary because we clear the conn_list explicitly later, > and it's buggy when other agents try to call connection_get() > with the same connection_track_table. > > Signed-off-by: Like Xu <like.xu@linux.intel.com> > Signed-off-by: Zhang Chen <chen.zhang@intel.com> > --- > net/colo-compare.c | 2 +- > net/filter-rewriter.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/colo-compare.c b/net/colo-compare.c > index 62554b5b3c..ab054cfd21 100644 > --- a/net/colo-compare.c > +++ b/net/colo-compare.c > @@ -1324,7 +1324,7 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp) > s->connection_track_table = g_hash_table_new_full(connection_key_hash, > connection_key_equal, > g_free, > - connection_destroy); > + NULL); 202 /* if not found, create a new connection and add to hash table */ 203 Connection *connection_get(GHashTable *connection_track_table, 204 ConnectionKey *key, 205 GQueue *conn_list) 206 { 207 Connection *conn = g_hash_table_lookup(connection_track_table, key); 208 209 if (conn == NULL) { 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key)); 211 212 conn = connection_new(key); 213 214 if (g_hash_table_size(connection_track_table) > HASHTABLE_MAX_SIZE) { 215 trace_colo_proxy_main("colo proxy connection hashtable full," 216 " clear it"); 217 connection_hashtable_reset(connection_track_table); 197 void connection_hashtable_reset(GHashTable *connection_track_table) 198 { 199 g_hash_table_remove_all(connection_track_table); 200 } IIUC, above subroutine will do some cleanup explicitly. And before your patch, connection_hashtable_reset() will release all keys and their values in this hashtable. But now, you remove all keys and just one value(conn_list) instead. Does it means other values will be leaked ? 218 /* 219 * clear the conn_list 220 */ 221 while (!g_queue_is_empty(conn_list)) { 222 connection_destroy(g_queue_pop_head(conn_list)); 223 } 224 } 225 226 g_hash_table_insert(connection_track_table, new_key, conn); 227 } 228 229 return conn; 230 } Thanks Zhijian > > colo_compare_iothread(s); > > diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c > index bf05023dc3..c18c4c2019 100644 > --- a/net/filter-rewriter.c > +++ b/net/filter-rewriter.c > @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp) > s->connection_track_table = g_hash_table_new_full(connection_key_hash, > connection_key_equal, > g_free, > - connection_destroy); > + NULL); > s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf); > } > ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list 2022-03-21 3:05 ` lizhijian @ 2022-03-28 9:13 ` Zhang, Chen 2022-03-31 1:14 ` lizhijian 0 siblings, 1 reply; 13+ messages in thread From: Zhang, Chen @ 2022-03-28 9:13 UTC (permalink / raw) To: lizhijian@fujitsu.com, Jason Wang; +Cc: qemu-dev, Like Xu > -----Original Message----- > From: lizhijian@fujitsu.com <lizhijian@fujitsu.com> > Sent: Monday, March 21, 2022 11:06 AM > To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang > <jasowang@redhat.com>; lizhijian@fujitsu.com > Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com> > Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the > conn_list > > > > On 09/03/2022 16:38, Zhang Chen wrote: > > We notice the QEMU may crash when the guest has too many incoming > > network connections with the following log: > > > > 15197@1593578622.668573:colo_proxy_main : colo proxy connection > > hashtable full, clear it > > free(): invalid pointer > > [1] 15195 abort (core dumped) qemu-system-x86_64 .... > > > > This is because we create the s->connection_track_table with > > g_hash_table_new_full() which is defined as: > > > > GHashTable * g_hash_table_new_full (GHashFunc hash_func, > > GEqualFunc key_equal_func, > > GDestroyNotify key_destroy_func, > > GDestroyNotify value_destroy_func); > > > > The fourth parameter connection_destroy() will be called to free the > > memory allocated for all 'Connection' values in the hashtable when we > > call g_hash_table_remove_all() in the connection_hashtable_reset(). > > > > It's unnecessary because we clear the conn_list explicitly later, and > > it's buggy when other agents try to call connection_get() with the > > same connection_track_table. > > > > Signed-off-by: Like Xu <like.xu@linux.intel.com> > > Signed-off-by: Zhang Chen <chen.zhang@intel.com> > > --- > > net/colo-compare.c | 2 +- > > net/filter-rewriter.c | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/net/colo-compare.c b/net/colo-compare.c index > > 62554b5b3c..ab054cfd21 100644 > > --- a/net/colo-compare.c > > +++ b/net/colo-compare.c > > @@ -1324,7 +1324,7 @@ static void > colo_compare_complete(UserCreatable *uc, Error **errp) > > s->connection_track_table = > g_hash_table_new_full(connection_key_hash, > > connection_key_equal, > > g_free, > > - connection_destroy); > > + NULL); > > > 202 /* if not found, create a new connection and add to hash table */ > 203 Connection *connection_get(GHashTable *connection_track_table, > 204 ConnectionKey *key, > 205 GQueue *conn_list) > 206 { > 207 Connection *conn = g_hash_table_lookup(connection_track_table, > key); > 208 > 209 if (conn == NULL) { > 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key)); > 211 > 212 conn = connection_new(key); > 213 > 214 if (g_hash_table_size(connection_track_table) > > HASHTABLE_MAX_SIZE) { > 215 trace_colo_proxy_main("colo proxy connection hashtable full," > 216 " clear it"); > 217 connection_hashtable_reset(connection_track_table); > > 197 void connection_hashtable_reset(GHashTable *connection_track_table) > 198 { > 199 g_hash_table_remove_all(connection_track_table); > 200 } > > IIUC, above subroutine will do some cleanup explicitly. And before your > patch, connection_hashtable_reset() will release all keys and their values in > this hashtable. But now, you remove all keys and just one value(conn_list) > instead. Does it means other values will be leaked ? Thanks Zhijian. Re-think about it. Yes, I think you are right. It looks need a lock to avoid into connection_get() when triggered the clear hashtable operation. What do you think? Thanks Chen > > > 218 /* > 219 * clear the conn_list > 220 */ > 221 while (!g_queue_is_empty(conn_list)) { > 222 connection_destroy(g_queue_pop_head(conn_list)); > 223 } > 224 } > 225 > 226 g_hash_table_insert(connection_track_table, new_key, conn); > 227 } > 228 > 229 return conn; > 230 } > > > Thanks > Zhijian > > > > > colo_compare_iothread(s); > > > > diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index > > bf05023dc3..c18c4c2019 100644 > > --- a/net/filter-rewriter.c > > +++ b/net/filter-rewriter.c > > @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf, > Error **errp) > > s->connection_track_table = > g_hash_table_new_full(connection_key_hash, > > connection_key_equal, > > g_free, > > - connection_destroy); > > + NULL); > > s->incoming_queue = > qemu_new_net_queue(qemu_netfilter_pass_to_next, nf); > > } > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list 2022-03-28 9:13 ` Zhang, Chen @ 2022-03-31 1:14 ` lizhijian 2022-03-31 2:25 ` Zhang, Chen 0 siblings, 1 reply; 13+ messages in thread From: lizhijian @ 2022-03-31 1:14 UTC (permalink / raw) To: Zhang, Chen, Jason Wang; +Cc: qemu-dev, Like Xu connection_track_table -----+---------- key1 | conn |-----------------------------------------------------------+ -----+---------- | key2 | conn |----------------------------------+ | -----+---------- | | key3 | conn |---------+ | | -----+---------- | | | | | | | | | + CompareState ++ | | | | | V V V +---------------+ +---------------+ +---------------+ |conn_list +--->conn +--------->conn | connx +---------------+ +---------------+ +---------------+ | | | | | | +---------------+ +---v----+ +---v----+ +---v----+ +---v----+ |primary | |secondary |primary | |secondary |packet | |packet + |packet | |packet + +--------+ +--------+ +--------+ +--------+ | | | | +---v----+ +---v----+ +---v----+ +---v----+ |primary | |secondary |primary | |secondary |packet | |packet + |packet | |packet + +--------+ +--------+ +--------+ +--------+ | | | | +---v----+ +---v----+ +---v----+ +---v----+ |primary | |secondary |primary | |secondary |packet | |packet + |packet | |packet + +--------+ +--------+ +--------+ +--------+ I recalled that we should above relationships between connection_track_table conn_list and conn. That means both connection_track_table and conn_list reference to the same conn instance. So before this patch, connection_get() is possible to use-after-free/double free conn. where 1st was in connection_hashtable_reset() and 2nd was 221 while (!g_queue_is_empty(conn_list)) { 222 connection_destroy(g_queue_pop_head(conn_list)); 223 } I also doubt that your current abort was just due to above use-after-free/double free. If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd place. Thanks Zhijian On 28/03/2022 17:13, Zhang, Chen wrote: > >> -----Original Message----- >> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com> >> Sent: Monday, March 21, 2022 11:06 AM >> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang >> <jasowang@redhat.com>; lizhijian@fujitsu.com >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com> >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the >> conn_list >> >> >> >> On 09/03/2022 16:38, Zhang Chen wrote: >>> We notice the QEMU may crash when the guest has too many incoming >>> network connections with the following log: >>> >>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection >>> hashtable full, clear it >>> free(): invalid pointer >>> [1] 15195 abort (core dumped) qemu-system-x86_64 .... >>> >>> This is because we create the s->connection_track_table with >>> g_hash_table_new_full() which is defined as: >>> >>> GHashTable * g_hash_table_new_full (GHashFunc hash_func, >>> GEqualFunc key_equal_func, >>> GDestroyNotify key_destroy_func, >>> GDestroyNotify value_destroy_func); >>> >>> The fourth parameter connection_destroy() will be called to free the >>> memory allocated for all 'Connection' values in the hashtable when we >>> call g_hash_table_remove_all() in the connection_hashtable_reset(). >>> >>> It's unnecessary because we clear the conn_list explicitly later, and >>> it's buggy when other agents try to call connection_get() with the >>> same connection_track_table. >>> >>> Signed-off-by: Like Xu <like.xu@linux.intel.com> >>> Signed-off-by: Zhang Chen <chen.zhang@intel.com> >>> --- >>> net/colo-compare.c | 2 +- >>> net/filter-rewriter.c | 2 +- >>> 2 files changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/net/colo-compare.c b/net/colo-compare.c index >>> 62554b5b3c..ab054cfd21 100644 >>> --- a/net/colo-compare.c >>> +++ b/net/colo-compare.c >>> @@ -1324,7 +1324,7 @@ static void >> colo_compare_complete(UserCreatable *uc, Error **errp) >>> s->connection_track_table = >> g_hash_table_new_full(connection_key_hash, >>> connection_key_equal, >>> g_free, >>> - connection_destroy); >>> + NULL); >> >> 202 /* if not found, create a new connection and add to hash table */ >> 203 Connection *connection_get(GHashTable *connection_track_table, >> 204 ConnectionKey *key, >> 205 GQueue *conn_list) >> 206 { >> 207 Connection *conn = g_hash_table_lookup(connection_track_table, >> key); >> 208 >> 209 if (conn == NULL) { >> 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key)); >> 211 >> 212 conn = connection_new(key); >> 213 >> 214 if (g_hash_table_size(connection_track_table) > >> HASHTABLE_MAX_SIZE) { >> 215 trace_colo_proxy_main("colo proxy connection hashtable full," >> 216 " clear it"); >> 217 connection_hashtable_reset(connection_track_table); >> >> 197 void connection_hashtable_reset(GHashTable *connection_track_table) >> 198 { >> 199 g_hash_table_remove_all(connection_track_table); >> 200 } >> >> IIUC, above subroutine will do some cleanup explicitly. And before your >> patch, connection_hashtable_reset() will release all keys and their values in >> this hashtable. But now, you remove all keys and just one value(conn_list) >> instead. Does it means other values will be leaked ? > Thanks Zhijian. Re-think about it. Yes, I think you are right. > It looks need a lock to avoid into connection_get() when triggered the clear hashtable operation. > What do you think? > > Thanks > Chen > > >> >> 218 /* >> 219 * clear the conn_list >> 220 */ >> 221 while (!g_queue_is_empty(conn_list)) { >> 222 connection_destroy(g_queue_pop_head(conn_list)); >> 223 } >> 224 } >> 225 >> 226 g_hash_table_insert(connection_track_table, new_key, conn); >> 227 } >> 228 >> 229 return conn; >> 230 } >> >> >> Thanks >> Zhijian >> >>> colo_compare_iothread(s); >>> >>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index >>> bf05023dc3..c18c4c2019 100644 >>> --- a/net/filter-rewriter.c >>> +++ b/net/filter-rewriter.c >>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf, >> Error **errp) >>> s->connection_track_table = >> g_hash_table_new_full(connection_key_hash, >>> connection_key_equal, >>> g_free, >>> - connection_destroy); >>> + NULL); >>> s->incoming_queue = >> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf); >>> } >>> ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list 2022-03-31 1:14 ` lizhijian @ 2022-03-31 2:25 ` Zhang, Chen 2022-04-01 1:46 ` lizhijian 0 siblings, 1 reply; 13+ messages in thread From: Zhang, Chen @ 2022-03-31 2:25 UTC (permalink / raw) To: lizhijian@fujitsu.com, Jason Wang; +Cc: qemu-dev, Like Xu > -----Original Message----- > From: lizhijian@fujitsu.com <lizhijian@fujitsu.com> > Sent: Thursday, March 31, 2022 9:15 AM > To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang > <jasowang@redhat.com> > Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com> > Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the > conn_list > > > connection_track_table > -----+---------- > key1 | conn |-----------------------------------------------------------+ > -----+---------- | > key2 | conn |----------------------------------+ | > -----+---------- | | > key3 | conn |---------+ | | > -----+---------- | | | > | | | > | | | > + CompareState ++ | | | > | | V V V > +---------------+ +---------------+ +---------------+ > |conn_list +--->conn +--------->conn | connx > +---------------+ +---------------+ +---------------+ > | | | | | | > +---------------+ +---v----+ +---v----+ +---v----+ +---v----+ > |primary | |secondary |primary | |secondary > |packet | |packet + |packet | |packet + > +--------+ +--------+ +--------+ +--------+ > | | | | > +---v----+ +---v----+ +---v----+ +---v----+ > |primary | |secondary |primary | |secondary > |packet | |packet + |packet | |packet + > +--------+ +--------+ +--------+ +--------+ > | | | | > +---v----+ +---v----+ +---v----+ +---v----+ > |primary | |secondary |primary | |secondary > |packet | |packet + |packet | |packet + > +--------+ +--------+ +--------+ +--------+ > > I recalled that we should above relationships between > connection_track_table conn_list and conn. > That means both connection_track_table and conn_list reference to the > same conn instance. > > So before this patch, connection_get() is possible to use-after-free/double > free conn. where 1st was in > connection_hashtable_reset() and 2nd was > 221 while (!g_queue_is_empty(conn_list)) { > 222 connection_destroy(g_queue_pop_head(conn_list)); > 223 } > > I also doubt that your current abort was just due to above use-after- > free/double free. > If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd > place. Make sense, but It also means the original patch works here, skip free conn in connection_hashtable_reset() and do it in: 221 while (!g_queue_is_empty(conn_list)) { 222 connection_destroy(g_queue_pop_head(conn_list)); 223 }. It also avoid use-after-free/double free conn. Maybe we can keep the original version to fix it? Thanks Chen > > Thanks > Zhijian > > > On 28/03/2022 17:13, Zhang, Chen wrote: > > > >> -----Original Message----- > >> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com> > >> Sent: Monday, March 21, 2022 11:06 AM > >> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang > >> <jasowang@redhat.com>; lizhijian@fujitsu.com > >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu > >> <like.xu@linux.intel.com> > >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear > >> the conn_list > >> > >> > >> > >> On 09/03/2022 16:38, Zhang Chen wrote: > >>> We notice the QEMU may crash when the guest has too many incoming > >>> network connections with the following log: > >>> > >>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection > >>> hashtable full, clear it > >>> free(): invalid pointer > >>> [1] 15195 abort (core dumped) qemu-system-x86_64 .... > >>> > >>> This is because we create the s->connection_track_table with > >>> g_hash_table_new_full() which is defined as: > >>> > >>> GHashTable * g_hash_table_new_full (GHashFunc hash_func, > >>> GEqualFunc key_equal_func, > >>> GDestroyNotify key_destroy_func, > >>> GDestroyNotify value_destroy_func); > >>> > >>> The fourth parameter connection_destroy() will be called to free the > >>> memory allocated for all 'Connection' values in the hashtable when > >>> we call g_hash_table_remove_all() in the connection_hashtable_reset(). > >>> > >>> It's unnecessary because we clear the conn_list explicitly later, > >>> and it's buggy when other agents try to call connection_get() with > >>> the same connection_track_table. > >>> > >>> Signed-off-by: Like Xu <like.xu@linux.intel.com> > >>> Signed-off-by: Zhang Chen <chen.zhang@intel.com> > >>> --- > >>> net/colo-compare.c | 2 +- > >>> net/filter-rewriter.c | 2 +- > >>> 2 files changed, 2 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/net/colo-compare.c b/net/colo-compare.c index > >>> 62554b5b3c..ab054cfd21 100644 > >>> --- a/net/colo-compare.c > >>> +++ b/net/colo-compare.c > >>> @@ -1324,7 +1324,7 @@ static void > >> colo_compare_complete(UserCreatable *uc, Error **errp) > >>> s->connection_track_table = > >> g_hash_table_new_full(connection_key_hash, > >>> connection_key_equal, > >>> g_free, > >>> - connection_destroy); > >>> + NULL); > >> > >> 202 /* if not found, create a new connection and add to hash table */ > >> 203 Connection *connection_get(GHashTable *connection_track_table, > >> 204 ConnectionKey *key, > >> 205 GQueue *conn_list) > >> 206 { > >> 207 Connection *conn = > >> g_hash_table_lookup(connection_track_table, > >> key); > >> 208 > >> 209 if (conn == NULL) { > >> 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key)); > >> 211 > >> 212 conn = connection_new(key); > >> 213 > >> 214 if (g_hash_table_size(connection_track_table) > > >> HASHTABLE_MAX_SIZE) { > >> 215 trace_colo_proxy_main("colo proxy connection hashtable full," > >> 216 " clear it"); > >> 217 connection_hashtable_reset(connection_track_table); > >> > >> 197 void connection_hashtable_reset(GHashTable > >> *connection_track_table) > >> 198 { > >> 199 g_hash_table_remove_all(connection_track_table); > >> 200 } > >> > >> IIUC, above subroutine will do some cleanup explicitly. And before > >> your patch, connection_hashtable_reset() will release all keys and > >> their values in this hashtable. But now, you remove all keys and just > >> one value(conn_list) instead. Does it means other values will be leaked ? > > Thanks Zhijian. Re-think about it. Yes, I think you are right. > > It looks need a lock to avoid into connection_get() when triggered the clear > hashtable operation. > > What do you think? > > > > Thanks > > Chen > > > > > >> > >> 218 /* > >> 219 * clear the conn_list > >> 220 */ > >> 221 while (!g_queue_is_empty(conn_list)) { > >> 222 connection_destroy(g_queue_pop_head(conn_list)); > >> 223 } > >> 224 } > >> 225 > >> 226 g_hash_table_insert(connection_track_table, new_key, > >> conn); > >> 227 } > >> 228 > >> 229 return conn; > >> 230 } > >> > >> > >> Thanks > >> Zhijian > >> > >>> colo_compare_iothread(s); > >>> > >>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index > >>> bf05023dc3..c18c4c2019 100644 > >>> --- a/net/filter-rewriter.c > >>> +++ b/net/filter-rewriter.c > >>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState > >>> *nf, > >> Error **errp) > >>> s->connection_track_table = > >> g_hash_table_new_full(connection_key_hash, > >>> connection_key_equal, > >>> g_free, > >>> - connection_destroy); > >>> + NULL); > >>> s->incoming_queue = > >> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf); > >>> } > >>> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list 2022-03-31 2:25 ` Zhang, Chen @ 2022-04-01 1:46 ` lizhijian 2022-04-01 3:33 ` Zhang, Chen 0 siblings, 1 reply; 13+ messages in thread From: lizhijian @ 2022-04-01 1:46 UTC (permalink / raw) To: Zhang, Chen, Jason Wang; +Cc: qemu-dev, Like Xu On 31/03/2022 10:25, Zhang, Chen wrote: > >> -----Original Message----- >> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com> >> Sent: Thursday, March 31, 2022 9:15 AM >> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang >> <jasowang@redhat.com> >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com> >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the >> conn_list >> >> >> connection_track_table >> -----+---------- >> key1 | conn |-----------------------------------------------------------+ >> -----+---------- | >> key2 | conn |----------------------------------+ | >> -----+---------- | | >> key3 | conn |---------+ | | >> -----+---------- | | | >> | | | >> | | | >> + CompareState ++ | | | >> | | V V V >> +---------------+ +---------------+ +---------------+ >> |conn_list +--->conn +--------->conn | connx >> +---------------+ +---------------+ +---------------+ >> | | | | | | >> +---------------+ +---v----+ +---v----+ +---v----+ +---v----+ >> |primary | |secondary |primary | |secondary >> |packet | |packet + |packet | |packet + >> +--------+ +--------+ +--------+ +--------+ >> | | | | >> +---v----+ +---v----+ +---v----+ +---v----+ >> |primary | |secondary |primary | |secondary >> |packet | |packet + |packet | |packet + >> +--------+ +--------+ +--------+ +--------+ >> | | | | >> +---v----+ +---v----+ +---v----+ +---v----+ >> |primary | |secondary |primary | |secondary >> |packet | |packet + |packet | |packet + >> +--------+ +--------+ +--------+ +--------+ >> >> I recalled that we should above relationships between >> connection_track_table conn_list and conn. >> That means both connection_track_table and conn_list reference to the >> same conn instance. >> >> So before this patch, connection_get() is possible to use-after-free/double >> free conn. where 1st was in >> connection_hashtable_reset() and 2nd was >> 221 while (!g_queue_is_empty(conn_list)) { >> 222 connection_destroy(g_queue_pop_head(conn_list)); >> 223 } >> >> I also doubt that your current abort was just due to above use-after- >> free/double free. >> If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd >> place. > Make sense, but It also means the original patch works here, skip free conn in connection_hashtable_reset() and do it in: > 221 while (!g_queue_is_empty(conn_list)) { > 222 connection_destroy(g_queue_pop_head(conn_list)); > 223 }. > It also avoid use-after-free/double free conn. Although you will not use-after-free here, you have to consider other situations carefully that g_hash_table_remove_all() g_hash_table_destroy() were called where the conn_list should also be freed with you approach. > Maybe we can keep the original version to fix it? And your commit log should be more clear. Thanks Zhijian > > Thanks > Chen > >> Thanks >> Zhijian >> >> >> On 28/03/2022 17:13, Zhang, Chen wrote: >>>> -----Original Message----- >>>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com> >>>> Sent: Monday, March 21, 2022 11:06 AM >>>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang >>>> <jasowang@redhat.com>; lizhijian@fujitsu.com >>>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu >>>> <like.xu@linux.intel.com> >>>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear >>>> the conn_list >>>> >>>> >>>> >>>> On 09/03/2022 16:38, Zhang Chen wrote: >>>>> We notice the QEMU may crash when the guest has too many incoming >>>>> network connections with the following log: >>>>> >>>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection >>>>> hashtable full, clear it >>>>> free(): invalid pointer >>>>> [1] 15195 abort (core dumped) qemu-system-x86_64 .... >>>>> >>>>> This is because we create the s->connection_track_table with >>>>> g_hash_table_new_full() which is defined as: >>>>> >>>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func, >>>>> GEqualFunc key_equal_func, >>>>> GDestroyNotify key_destroy_func, >>>>> GDestroyNotify value_destroy_func); >>>>> >>>>> The fourth parameter connection_destroy() will be called to free the >>>>> memory allocated for all 'Connection' values in the hashtable when >>>>> we call g_hash_table_remove_all() in the connection_hashtable_reset(). >>>>> >>>>> It's unnecessary because we clear the conn_list explicitly later, >>>>> and it's buggy when other agents try to call connection_get() with >>>>> the same connection_track_table. >>>>> >>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com> >>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com> >>>>> --- >>>>> net/colo-compare.c | 2 +- >>>>> net/filter-rewriter.c | 2 +- >>>>> 2 files changed, 2 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index >>>>> 62554b5b3c..ab054cfd21 100644 >>>>> --- a/net/colo-compare.c >>>>> +++ b/net/colo-compare.c >>>>> @@ -1324,7 +1324,7 @@ static void >>>> colo_compare_complete(UserCreatable *uc, Error **errp) >>>>> s->connection_track_table = >>>> g_hash_table_new_full(connection_key_hash, >>>>> connection_key_equal, >>>>> g_free, >>>>> - connection_destroy); >>>>> + NULL); >>>> 202 /* if not found, create a new connection and add to hash table */ >>>> 203 Connection *connection_get(GHashTable *connection_track_table, >>>> 204 ConnectionKey *key, >>>> 205 GQueue *conn_list) >>>> 206 { >>>> 207 Connection *conn = >>>> g_hash_table_lookup(connection_track_table, >>>> key); >>>> 208 >>>> 209 if (conn == NULL) { >>>> 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key)); >>>> 211 >>>> 212 conn = connection_new(key); >>>> 213 >>>> 214 if (g_hash_table_size(connection_track_table) > >>>> HASHTABLE_MAX_SIZE) { >>>> 215 trace_colo_proxy_main("colo proxy connection hashtable full," >>>> 216 " clear it"); >>>> 217 connection_hashtable_reset(connection_track_table); >>>> >>>> 197 void connection_hashtable_reset(GHashTable >>>> *connection_track_table) >>>> 198 { >>>> 199 g_hash_table_remove_all(connection_track_table); >>>> 200 } >>>> >>>> IIUC, above subroutine will do some cleanup explicitly. And before >>>> your patch, connection_hashtable_reset() will release all keys and >>>> their values in this hashtable. But now, you remove all keys and just >>>> one value(conn_list) instead. Does it means other values will be leaked ? >>> Thanks Zhijian. Re-think about it. Yes, I think you are right. >>> It looks need a lock to avoid into connection_get() when triggered the clear >> hashtable operation. >>> What do you think? >>> >>> Thanks >>> Chen >>> >>> >>>> 218 /* >>>> 219 * clear the conn_list >>>> 220 */ >>>> 221 while (!g_queue_is_empty(conn_list)) { >>>> 222 connection_destroy(g_queue_pop_head(conn_list)); >>>> 223 } >>>> 224 } >>>> 225 >>>> 226 g_hash_table_insert(connection_track_table, new_key, >>>> conn); >>>> 227 } >>>> 228 >>>> 229 return conn; >>>> 230 } >>>> >>>> >>>> Thanks >>>> Zhijian >>>> >>>>> colo_compare_iothread(s); >>>>> >>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index >>>>> bf05023dc3..c18c4c2019 100644 >>>>> --- a/net/filter-rewriter.c >>>>> +++ b/net/filter-rewriter.c >>>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState >>>>> *nf, >>>> Error **errp) >>>>> s->connection_track_table = >>>> g_hash_table_new_full(connection_key_hash, >>>>> connection_key_equal, >>>>> g_free, >>>>> - connection_destroy); >>>>> + NULL); >>>>> s->incoming_queue = >>>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf); >>>>> } >>>>> ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list 2022-04-01 1:46 ` lizhijian @ 2022-04-01 3:33 ` Zhang, Chen 0 siblings, 0 replies; 13+ messages in thread From: Zhang, Chen @ 2022-04-01 3:33 UTC (permalink / raw) To: lizhijian@fujitsu.com, Jason Wang; +Cc: qemu-dev, Like Xu > -----Original Message----- > From: lizhijian@fujitsu.com <lizhijian@fujitsu.com> > Sent: Friday, April 1, 2022 9:47 AM > To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang > <jasowang@redhat.com> > Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com> > Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the > conn_list > > > > On 31/03/2022 10:25, Zhang, Chen wrote: > > > >> -----Original Message----- > >> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com> > >> Sent: Thursday, March 31, 2022 9:15 AM > >> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang > >> <jasowang@redhat.com> > >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu > >> <like.xu@linux.intel.com> > >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear > >> the conn_list > >> > >> > >> connection_track_table > >> -----+---------- > >> key1 | conn |-----------------------------------------------------------+ > >> -----+---------- | > >> key2 | conn |----------------------------------+ | > >> -----+---------- | | > >> key3 | conn |---------+ | | > >> -----+---------- | | | > >> | | | > >> | | | > >> + CompareState ++ | | | > >> | | V V V > >> +---------------+ +---------------+ +---------------+ > >> |conn_list +--->conn +--------->conn | connx > >> +---------------+ +---------------+ +---------------+ > >> | | | | | | > >> +---------------+ +---v----+ +---v----+ +---v----+ +---v----+ > >> |primary | |secondary |primary | |secondary > >> |packet | |packet + |packet | |packet + > >> +--------+ +--------+ +--------+ +--------+ > >> | | | | > >> +---v----+ +---v----+ +---v----+ +---v----+ > >> |primary | |secondary |primary | |secondary > >> |packet | |packet + |packet | |packet + > >> +--------+ +--------+ +--------+ +--------+ > >> | | | | > >> +---v----+ +---v----+ +---v----+ +---v----+ > >> |primary | |secondary |primary | |secondary > >> |packet | |packet + |packet | |packet + > >> +--------+ +--------+ +--------+ +--------+ > >> > >> I recalled that we should above relationships between > >> connection_track_table conn_list and conn. > >> That means both connection_track_table and conn_list reference to the > >> same conn instance. > >> > >> So before this patch, connection_get() is possible to > >> use-after-free/double free conn. where 1st was in > >> connection_hashtable_reset() and 2nd was > >> 221 while (!g_queue_is_empty(conn_list)) { > >> 222 connection_destroy(g_queue_pop_head(conn_list)); > >> 223 } > >> > >> I also doubt that your current abort was just due to above use-after- > >> free/double free. > >> If so, looks it's enough we just update to g_queue_clear(conn_list) > >> in the 2nd place. > > Make sense, but It also means the original patch works here, skip free conn > in connection_hashtable_reset() and do it in: > > 221 while (!g_queue_is_empty(conn_list)) { > > 222 connection_destroy(g_queue_pop_head(conn_list)); > > 223 }. > > It also avoid use-after-free/double free conn. > Although you will not use-after-free here, you have to consider other > situations carefully that > g_hash_table_remove_all() g_hash_table_destroy() were called where the > conn_list should also be freed with you approach. > > I re-checked the code, it looks fine to me. > > > > Maybe we can keep the original version to fix it? > And your commit log should be more clear. OK, I will update V2 for commit log. Thanks Chen > > Thanks > Zhijian > > > > > Thanks > > Chen > > > >> Thanks > >> Zhijian > >> > >> > >> On 28/03/2022 17:13, Zhang, Chen wrote: > >>>> -----Original Message----- > >>>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com> > >>>> Sent: Monday, March 21, 2022 11:06 AM > >>>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang > >>>> <jasowang@redhat.com>; lizhijian@fujitsu.com > >>>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu > >>>> <like.xu@linux.intel.com> > >>>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to > >>>> clear the conn_list > >>>> > >>>> > >>>> > >>>> On 09/03/2022 16:38, Zhang Chen wrote: > >>>>> We notice the QEMU may crash when the guest has too many > incoming > >>>>> network connections with the following log: > >>>>> > >>>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection > >>>>> hashtable full, clear it > >>>>> free(): invalid pointer > >>>>> [1] 15195 abort (core dumped) qemu-system-x86_64 .... > >>>>> > >>>>> This is because we create the s->connection_track_table with > >>>>> g_hash_table_new_full() which is defined as: > >>>>> > >>>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func, > >>>>> GEqualFunc key_equal_func, > >>>>> GDestroyNotify key_destroy_func, > >>>>> GDestroyNotify value_destroy_func); > >>>>> > >>>>> The fourth parameter connection_destroy() will be called to free > >>>>> the memory allocated for all 'Connection' values in the hashtable > >>>>> when we call g_hash_table_remove_all() in the > connection_hashtable_reset(). > >>>>> > >>>>> It's unnecessary because we clear the conn_list explicitly later, > >>>>> and it's buggy when other agents try to call connection_get() with > >>>>> the same connection_track_table. > >>>>> > >>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com> > >>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com> > >>>>> --- > >>>>> net/colo-compare.c | 2 +- > >>>>> net/filter-rewriter.c | 2 +- > >>>>> 2 files changed, 2 insertions(+), 2 deletions(-) > >>>>> > >>>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index > >>>>> 62554b5b3c..ab054cfd21 100644 > >>>>> --- a/net/colo-compare.c > >>>>> +++ b/net/colo-compare.c > >>>>> @@ -1324,7 +1324,7 @@ static void > >>>> colo_compare_complete(UserCreatable *uc, Error **errp) > >>>>> s->connection_track_table = > >>>> g_hash_table_new_full(connection_key_hash, > >>>>> connection_key_equal, > >>>>> g_free, > >>>>> - connection_destroy); > >>>>> + NULL); > >>>> 202 /* if not found, create a new connection and add to hash table > >>>> */ > >>>> 203 Connection *connection_get(GHashTable *connection_track_table, > >>>> 204 ConnectionKey *key, > >>>> 205 GQueue *conn_list) > >>>> 206 { > >>>> 207 Connection *conn = > >>>> g_hash_table_lookup(connection_track_table, > >>>> key); > >>>> 208 > >>>> 209 if (conn == NULL) { > >>>> 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key)); > >>>> 211 > >>>> 212 conn = connection_new(key); > >>>> 213 > >>>> 214 if (g_hash_table_size(connection_track_table) > > >>>> HASHTABLE_MAX_SIZE) { > >>>> 215 trace_colo_proxy_main("colo proxy connection hashtable > full," > >>>> 216 " clear it"); > >>>> 217 connection_hashtable_reset(connection_track_table); > >>>> > >>>> 197 void connection_hashtable_reset(GHashTable > >>>> *connection_track_table) > >>>> 198 { > >>>> 199 g_hash_table_remove_all(connection_track_table); > >>>> 200 } > >>>> > >>>> IIUC, above subroutine will do some cleanup explicitly. And before > >>>> your patch, connection_hashtable_reset() will release all keys and > >>>> their values in this hashtable. But now, you remove all keys and > >>>> just one value(conn_list) instead. Does it means other values will be > leaked ? > >>> Thanks Zhijian. Re-think about it. Yes, I think you are right. > >>> It looks need a lock to avoid into connection_get() when triggered > >>> the clear > >> hashtable operation. > >>> What do you think? > >>> > >>> Thanks > >>> Chen > >>> > >>> > >>>> 218 /* > >>>> 219 * clear the conn_list > >>>> 220 */ > >>>> 221 while (!g_queue_is_empty(conn_list)) { > >>>> 222 connection_destroy(g_queue_pop_head(conn_list)); > >>>> 223 } > >>>> 224 } > >>>> 225 > >>>> 226 g_hash_table_insert(connection_track_table, new_key, > >>>> conn); > >>>> 227 } > >>>> 228 > >>>> 229 return conn; > >>>> 230 } > >>>> > >>>> > >>>> Thanks > >>>> Zhijian > >>>> > >>>>> colo_compare_iothread(s); > >>>>> > >>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index > >>>>> bf05023dc3..c18c4c2019 100644 > >>>>> --- a/net/filter-rewriter.c > >>>>> +++ b/net/filter-rewriter.c > >>>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState > >>>>> *nf, > >>>> Error **errp) > >>>>> s->connection_track_table = > >>>> g_hash_table_new_full(connection_key_hash, > >>>>> connection_key_equal, > >>>>> g_free, > >>>>> - connection_destroy); > >>>>> + NULL); > >>>>> s->incoming_queue = > >>>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf); > >>>>> } > >>>>> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter 2022-03-09 8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen 2022-03-09 8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen 2022-03-09 8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen @ 2022-03-09 8:38 ` Zhang Chen 2022-03-21 3:16 ` lizhijian 2022-03-09 8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen 3 siblings, 1 reply; 13+ messages in thread From: Zhang Chen @ 2022-03-09 8:38 UTC (permalink / raw) To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev Filter-rewriter no need to track connection in conn_list. This patch fix the glib g_queue_is_empty assertion when COLO guest keep a lot of network connection. Signed-off-by: Zhang Chen <chen.zhang@intel.com> --- net/colo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/colo.c b/net/colo.c index 1f8162f59f..694f3c93ef 100644 --- a/net/colo.c +++ b/net/colo.c @@ -218,7 +218,7 @@ Connection *connection_get(GHashTable *connection_track_table, /* * clear the conn_list */ - while (!g_queue_is_empty(conn_list)) { + while (conn_list && !g_queue_is_empty(conn_list)) { connection_destroy(g_queue_pop_head(conn_list)); } } -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter 2022-03-09 8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen @ 2022-03-21 3:16 ` lizhijian 0 siblings, 0 replies; 13+ messages in thread From: lizhijian @ 2022-03-21 3:16 UTC (permalink / raw) To: Zhang Chen, Jason Wang, lizhijian@fujitsu.com; +Cc: qemu-dev On 09/03/2022 16:38, Zhang Chen wrote: > Filter-rewriter no need to track connection in conn_list. > This patch fix the glib g_queue_is_empty assertion when COLO guest > keep a lot of network connection. > > Signed-off-by: Zhang Chen <chen.zhang@intel.com> LGTM. Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> > --- > net/colo.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/colo.c b/net/colo.c > index 1f8162f59f..694f3c93ef 100644 > --- a/net/colo.c > +++ b/net/colo.c > @@ -218,7 +218,7 @@ Connection *connection_get(GHashTable *connection_track_table, > /* > * clear the conn_list > */ > - while (!g_queue_is_empty(conn_list)) { > + while (conn_list && !g_queue_is_empty(conn_list)) { > connection_destroy(g_queue_pop_head(conn_list)); > } > } ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly 2022-03-09 8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen ` (2 preceding siblings ...) 2022-03-09 8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen @ 2022-03-09 8:38 ` Zhang Chen 2022-03-21 3:39 ` lizhijian 3 siblings, 1 reply; 13+ messages in thread From: Zhang Chen @ 2022-03-09 8:38 UTC (permalink / raw) To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, Tao Xu, qemu-dev When COLO use only one vnet_hdr_support parameter between filter-redirector and filter-mirror(or colo-compare), COLO will crash with segmentation fault. Back track as follow: Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0) at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296 296 uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto); (gdb) bt 0 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0) at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296 1 0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at net/colo.c:49 2 0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at net/filter-rewriter.c:63 So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to raise error and add trace-events to track vnet_hdr_len. Signed-off-by: Tao Xu <tao3.xu@intel.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> --- net/colo.c | 9 ++++++++- net/trace-events | 1 + 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/net/colo.c b/net/colo.c index 694f3c93ef..6b0ff562ad 100644 --- a/net/colo.c +++ b/net/colo.c @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt) static const uint8_t vlan[] = {0x81, 0x00}; uint8_t *data = pkt->data + pkt->vnet_hdr_len; uint16_t l3_proto; - ssize_t l2hdr_len = eth_get_l2_hdr_length(data); + ssize_t l2hdr_len; + + if (data == NULL) { + trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, " + "pkt->vnet_hdr_len", pkt->vnet_hdr_len); + return 1; + } + l2hdr_len = eth_get_l2_hdr_length(data); if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) { trace_colo_proxy_main("pkt->size < ETH_HLEN"); diff --git a/net/trace-events b/net/trace-events index d7a17256cc..6af927b4b9 100644 --- a/net/trace-events +++ b/net/trace-events @@ -9,6 +9,7 @@ vhost_user_event(const char *chr, int event) "chr: %s got event: %d" # colo.c colo_proxy_main(const char *chr) ": %s" +colo_proxy_main_vnet_info(const char *sta, int size) ": %s = %d" # colo-compare.c colo_compare_main(const char *chr) ": %s" -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly 2022-03-09 8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen @ 2022-03-21 3:39 ` lizhijian 0 siblings, 0 replies; 13+ messages in thread From: lizhijian @ 2022-03-21 3:39 UTC (permalink / raw) To: Zhang Chen, Jason Wang, lizhijian@fujitsu.com; +Cc: Tao Xu, qemu-dev On 09/03/2022 16:38, Zhang Chen wrote: > When COLO use only one vnet_hdr_support parameter between > filter-redirector and filter-mirror(or colo-compare), COLO will crash > with segmentation fault. Back track as follow: > > Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. > 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0) > at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296 > 296 uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto); > (gdb) bt > 0 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0) > at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296 > 1 0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at > net/colo.c:49 > 2 0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at > net/filter-rewriter.c:63 > > So wrong vnet_hdr_len will cause pkt->data become NULL. Not sure if we can check this earlier, well Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> > Add check to > raise error and add trace-events to track vnet_hdr_len. > > Signed-off-by: Tao Xu <tao3.xu@intel.com> > Signed-off-by: Zhang Chen <chen.zhang@intel.com> > --- > net/colo.c | 9 ++++++++- > net/trace-events | 1 + > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/net/colo.c b/net/colo.c > index 694f3c93ef..6b0ff562ad 100644 > --- a/net/colo.c > +++ b/net/colo.c > @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt) > static const uint8_t vlan[] = {0x81, 0x00}; > uint8_t *data = pkt->data + pkt->vnet_hdr_len; > uint16_t l3_proto; > - ssize_t l2hdr_len = eth_get_l2_hdr_length(data); > + ssize_t l2hdr_len; > + > + if (data == NULL) { > + trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, " > + "pkt->vnet_hdr_len", pkt->vnet_hdr_len); > + return 1; > + } > + l2hdr_len = eth_get_l2_hdr_length(data); > > if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) { > trace_colo_proxy_main("pkt->size < ETH_HLEN"); > diff --git a/net/trace-events b/net/trace-events > index d7a17256cc..6af927b4b9 100644 > --- a/net/trace-events > +++ b/net/trace-events > @@ -9,6 +9,7 @@ vhost_user_event(const char *chr, int event) "chr: %s got event: %d" > > # colo.c > colo_proxy_main(const char *chr) ": %s" > +colo_proxy_main_vnet_info(const char *sta, int size) ": %s = %d" > > # colo-compare.c > colo_compare_main(const char *chr) ": %s" ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-04-01 3:34 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-03-09 8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen 2022-03-09 8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen 2022-03-09 8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen 2022-03-21 3:05 ` lizhijian 2022-03-28 9:13 ` Zhang, Chen 2022-03-31 1:14 ` lizhijian 2022-03-31 2:25 ` Zhang, Chen 2022-04-01 1:46 ` lizhijian 2022-04-01 3:33 ` Zhang, Chen 2022-03-09 8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen 2022-03-21 3:16 ` lizhijian 2022-03-09 8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen 2022-03-21 3:39 ` lizhijian
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).