[Qemu-devel] transient failure in the test-qht tests

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] transient failure in the test-qht tests
@ 2016-08-24 20:39 Peter Maydell
  2016-08-24 23:44 ` Emilio G. Cota
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Maydell @ 2016-08-24 20:39 UTC (permalink / raw)
  To: QEMU Developers, Emilio G. Cota

So I encountered this test failure running 'make check' on
32-bit ARM:

MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} gtester -k
--verbose -m=quick tests/test-qht
TEST: tests/test-qht... (pid=15763)
  /qht/mode/default:                                                   OK
  /qht/mode/resize:                                                    FAIL
GTester: last random seed: R02S08efd89fe4d862dd0191c13d5ce4d76e
(pid=16462)
FAIL: tests/test-qht

The test suite passed on a rerun.

Any ideas?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] transient failure in the test-qht tests
  2016-08-24 20:39 [Qemu-devel] transient failure in the test-qht tests Peter Maydell
@ 2016-08-24 23:44 ` Emilio G. Cota
  2016-08-24 23:52   ` Peter Maydell
  0 siblings, 1 reply; 9+ messages in thread
From: Emilio G. Cota @ 2016-08-24 23:44 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On Wed, Aug 24, 2016 at 21:39:01 +0100, Peter Maydell wrote:
> So I encountered this test failure running 'make check' on
> 32-bit ARM:
> 
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} gtester -k
> --verbose -m=quick tests/test-qht
> TEST: tests/test-qht... (pid=15763)
>   /qht/mode/default:                                                   OK
>   /qht/mode/resize:                                                    FAIL
> GTester: last random seed: R02S08efd89fe4d862dd0191c13d5ce4d76e
> (pid=16462)
> FAIL: tests/test-qht
> 
> The test suite passed on a rerun.
> 
> Any ideas?

I wonder whether malloc perturb had to do with the failure, because
-ENOMEM is unlikely (I only see a few MB of peak mem usage for qht-test)

However, I just ran qht-test under valgrind on an i686 machine, and it comes
clean.

I also brute-forced this to see if a particular perturb value would
make it fail:
  for i in $(seq 0 255); do \
    echo $i && \
    MALLOC_PERTURB_=$i gtester -k --verbose -m=quick tests/test-qht \
    --seed=R02S08efd89fe4d862dd0191c13d5ce4d76e || break; \
  done

I get no failures on both i686 and x86_64, with and without that --seed flag.

Is there any chance of getting a core dump for the failure you encountered?

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] transient failure in the test-qht tests
  2016-08-24 23:44 ` Emilio G. Cota
@ 2016-08-24 23:52   ` Peter Maydell
  2016-10-05 22:34     ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Maydell @ 2016-08-24 23:52 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: QEMU Developers

On 25 August 2016 at 00:44, Emilio G. Cota <cota@braap.org> wrote:
> On Wed, Aug 24, 2016 at 21:39:01 +0100, Peter Maydell wrote:
>> So I encountered this test failure running 'make check' on
>> 32-bit ARM:
>>
>> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} gtester -k
>> --verbose -m=quick tests/test-qht
>> TEST: tests/test-qht... (pid=15763)
>>   /qht/mode/default:                                                   OK
>>   /qht/mode/resize:                                                    FAIL
>> GTester: last random seed: R02S08efd89fe4d862dd0191c13d5ce4d76e
>> (pid=16462)
>> FAIL: tests/test-qht
>>
>> The test suite passed on a rerun.
>>
>> Any ideas?
>
> I wonder whether malloc perturb had to do with the failure, because
> -ENOMEM is unlikely (I only see a few MB of peak mem usage for qht-test)
>
> However, I just ran qht-test under valgrind on an i686 machine, and it comes
> clean.
>
> I also brute-forced this to see if a particular perturb value would
> make it fail:
>   for i in $(seq 0 255); do \
>     echo $i && \
>     MALLOC_PERTURB_=$i gtester -k --verbose -m=quick tests/test-qht \
>     --seed=R02S08efd89fe4d862dd0191c13d5ce4d76e || break; \
>   done
>
> I get no failures on both i686 and x86_64, with and without that --seed flag.
>
> Is there any chance of getting a core dump for the failure you encountered?

Unfortunately not, the test config doesn't save core dumps. In any case
I assume from the output that the test didn't actually dump core, it
just failed (without saying anything about why).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 0/3] qht fixes
  2016-08-24 23:52   ` Peter Maydell
@ 2016-10-05 22:34     ` Emilio G. Cota
  2016-10-05 22:34       ` [Qemu-devel] [PATCH 1/3] qht: simplify qht_reset_size Emilio G. Cota
                         ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Emilio G. Cota @ 2016-10-05 22:34 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Daniel P . Berrange, Paolo Bonzini, Alex Bennée,
	Richard Henderson, QEMU Developers

Patch 1 fixes a warning that gcc may unnecessarily emit.

Patch 2 fixes a real bug that sometimes shows up as a segfault in test-qht.
Daniel reported it yesterday on IRC; the trick to easily trigger it is to
run on RHEL6 (or CentOS6).
It is very likely that this is the test-qht failure that Peter reported here:
  https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03771.html
(we cannot be 100% sure due to the lack of output there; however the fact that
it's the resize what fails supports the hypothesis that we're indeed hitting the
same bug.)
I'm therefore adding Peter's reported-by tag to the patch along with Daniel's.

Patch 3 is merely good practice, since test-qht is single-threaded. However
I like having it, since test-qht serves as a de facto usage example
of qht.

Given that patch 2 alone would conflict without previously applying patch 1,
I propose to merge both patches 1 & 2 to 2.7-stable. I'll send them to
qemu-stable once the patchset is picked up to be merged onto master.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 1/3] qht: simplify qht_reset_size
  2016-10-05 22:34     ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
@ 2016-10-05 22:34       ` Emilio G. Cota
  2016-10-05 22:34       ` [Qemu-devel] [PATCH 2/3] qht: fix unlock-after-free segfault upon resizing Emilio G. Cota
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Emilio G. Cota @ 2016-10-05 22:34 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Daniel P . Berrange, Paolo Bonzini, Alex Bennée,
	Richard Henderson, QEMU Developers

Sometimes gcc doesn't pick up the fact that 'new' is properly
set if 'resize == true', which may generate an unnecessary
build warning.

Fix it by removing 'resize' and directly checking that 'new'
is non-NULL.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 util/qht.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/util/qht.c b/util/qht.c
index 16a8d79..af8da3c 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -410,10 +410,9 @@ void qht_reset(struct qht *ht)
 
 bool qht_reset_size(struct qht *ht, size_t n_elems)
 {
-    struct qht_map *new;
+    struct qht_map *new = NULL;
     struct qht_map *map;
     size_t n_buckets;
-    bool resize = false;
 
     n_buckets = qht_elems_to_buckets(n_elems);
 
@@ -421,18 +420,17 @@ bool qht_reset_size(struct qht *ht, size_t n_elems)
     map = ht->map;
     if (n_buckets != map->n_buckets) {
         new = qht_map_create(n_buckets);
-        resize = true;
     }
 
     qht_map_lock_buckets(map);
     qht_map_reset__all_locked(map);
-    if (resize) {
+    if (new) {
         qht_do_resize(ht, new);
     }
     qht_map_unlock_buckets(map);
     qemu_mutex_unlock(&ht->lock);
 
-    return resize;
+    return !!new;
 }
 
 static inline
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 2/3] qht: fix unlock-after-free segfault upon resizing
  2016-10-05 22:34     ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
  2016-10-05 22:34       ` [Qemu-devel] [PATCH 1/3] qht: simplify qht_reset_size Emilio G. Cota
@ 2016-10-05 22:34       ` Emilio G. Cota
  2016-10-05 22:34       ` [Qemu-devel] [PATCH 3/3] test-qht: perform lookups under rcu_read_lock Emilio G. Cota
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Emilio G. Cota @ 2016-10-05 22:34 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Daniel P . Berrange, Paolo Bonzini, Alex Bennée,
	Richard Henderson, QEMU Developers

The old map's bucket locks are being unlocked *after*
that same old map has been passed to RCU for destruction.
This is a bug that can cause a segfault, since there's
no guarantee that the deletion will be deferred (e.g.
there may be no concurrent readers).

The segfault is easily triggered in RHEL6/CentOS6 with qht-test,
particularly on a single-core system or by pinning qht-test
to a single core.

Fix it by unlocking the map's bucket locks right after having
published the new map, and (crucially) before marking the map
for deletion via call_rcu().

While at it, expand qht_do_resize() to atomically do (1) a reset,
(2) a resize, or (3) a reset+resize. This simplifies the calling
code, since the new function (qht_do_resize_reset()) acquires
and releases the buckets' locks.

Note that no qht_do_reset inline is provided, since it would have
no users--qht_reset() already performs a reset without taking
ht->lock.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reported-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 util/qht.c | 49 ++++++++++++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/util/qht.c b/util/qht.c
index af8da3c..6c61aca 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -133,7 +133,8 @@ struct qht_map {
 /* trigger a resize when n_added_buckets > n_buckets / div */
 #define QHT_NR_ADDED_BUCKETS_THRESHOLD_DIV 8
 
-static void qht_do_resize(struct qht *ht, struct qht_map *new);
+static void qht_do_resize_reset(struct qht *ht, struct qht_map *new,
+                                bool reset);
 static void qht_grow_maybe(struct qht *ht);
 
 #ifdef QHT_DEBUG
@@ -408,6 +409,16 @@ void qht_reset(struct qht *ht)
     qht_map_unlock_buckets(map);
 }
 
+static inline void qht_do_resize(struct qht *ht, struct qht_map *new)
+{
+    qht_do_resize_reset(ht, new, false);
+}
+
+static inline void qht_do_resize_and_reset(struct qht *ht, struct qht_map *new)
+{
+    qht_do_resize_reset(ht, new, true);
+}
+
 bool qht_reset_size(struct qht *ht, size_t n_elems)
 {
     struct qht_map *new = NULL;
@@ -421,13 +432,7 @@ bool qht_reset_size(struct qht *ht, size_t n_elems)
     if (n_buckets != map->n_buckets) {
         new = qht_map_create(n_buckets);
     }
-
-    qht_map_lock_buckets(map);
-    qht_map_reset__all_locked(map);
-    if (new) {
-        qht_do_resize(ht, new);
-    }
-    qht_map_unlock_buckets(map);
+    qht_do_resize_and_reset(ht, new);
     qemu_mutex_unlock(&ht->lock);
 
     return !!new;
@@ -559,9 +564,7 @@ static __attribute__((noinline)) void qht_grow_maybe(struct qht *ht)
     if (qht_map_needs_resize(map)) {
         struct qht_map *new = qht_map_create(map->n_buckets * 2);
 
-        qht_map_lock_buckets(map);
         qht_do_resize(ht, new);
-        qht_map_unlock_buckets(map);
     }
     qemu_mutex_unlock(&ht->lock);
 }
@@ -737,24 +740,31 @@ static void qht_map_copy(struct qht *ht, void *p, uint32_t hash, void *userp)
 }
 
 /*
- * Call with ht->lock and all bucket locks held.
- *
- * Creating the @new map here would add unnecessary delay while all the locks
- * are held--holding up the bucket locks is particularly bad, since no writes
- * can occur while these are held. Thus, we let callers create the new map,
- * hopefully without the bucket locks held.
+ * Atomically perform a resize and/or reset.
+ * Call with ht->lock held.
  */
-static void qht_do_resize(struct qht *ht, struct qht_map *new)
+static void qht_do_resize_reset(struct qht *ht, struct qht_map *new, bool reset)
 {
     struct qht_map *old;
 
     old = ht->map;
-    g_assert_cmpuint(new->n_buckets, !=, old->n_buckets);
+    qht_map_lock_buckets(old);
 
+    if (reset) {
+        qht_map_reset__all_locked(old);
+    }
+
+    if (new == NULL) {
+        qht_map_unlock_buckets(old);
+        return;
+    }
+
+    g_assert_cmpuint(new->n_buckets, !=, old->n_buckets);
     qht_map_iter__all_locked(ht, old, qht_map_copy, new);
     qht_map_debug__all_locked(new);
 
     atomic_rcu_set(&ht->map, new);
+    qht_map_unlock_buckets(old);
     call_rcu(old, qht_map_destroy, rcu);
 }
 
@@ -766,12 +776,9 @@ bool qht_resize(struct qht *ht, size_t n_elems)
     qemu_mutex_lock(&ht->lock);
     if (n_buckets != ht->map->n_buckets) {
         struct qht_map *new;
-        struct qht_map *old = ht->map;
 
         new = qht_map_create(n_buckets);
-        qht_map_lock_buckets(old);
         qht_do_resize(ht, new);
-        qht_map_unlock_buckets(old);
         ret = true;
     }
     qemu_mutex_unlock(&ht->lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 3/3] test-qht: perform lookups under rcu_read_lock
  2016-10-05 22:34     ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
  2016-10-05 22:34       ` [Qemu-devel] [PATCH 1/3] qht: simplify qht_reset_size Emilio G. Cota
  2016-10-05 22:34       ` [Qemu-devel] [PATCH 2/3] qht: fix unlock-after-free segfault upon resizing Emilio G. Cota
@ 2016-10-05 22:34       ` Emilio G. Cota
  2016-10-06  8:31       ` [Qemu-devel] [PATCH 0/3] qht fixes Dr. David Alan Gilbert
  2016-10-06 10:56       ` Paolo Bonzini
  4 siblings, 0 replies; 9+ messages in thread
From: Emilio G. Cota @ 2016-10-05 22:34 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Daniel P . Berrange, Paolo Bonzini, Alex Bennée,
	Richard Henderson, QEMU Developers

qht_lookup is meant to be called from an RCU read-critical
section. Make sure we're in such a section in test-qht
when performing lookups, despite the fact that no races
in qht can be triggered by test-qht since it is single-threaded.

Note that rcu_register_thread is already called by the
rcu_after_fork hook, and therefore duplicating it here would
be a bug.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tests/test-qht.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tests/test-qht.c b/tests/test-qht.c
index 46a64b6..9b7423a 100644
--- a/tests/test-qht.c
+++ b/tests/test-qht.c
@@ -6,6 +6,7 @@
  */
 #include "qemu/osdep.h"
 #include "qemu/qht.h"
+#include "qemu/rcu.h"
 
 #define N 5000
 
@@ -51,6 +52,7 @@ static void check(int a, int b, bool expected)
     struct qht_stats stats;
     int i;
 
+    rcu_read_lock();
     for (i = a; i < b; i++) {
         void *p;
         uint32_t hash;
@@ -61,6 +63,8 @@ static void check(int a, int b, bool expected)
         p = qht_lookup(&ht, is_equal, &val, hash);
         g_assert_true(!!p == expected);
     }
+    rcu_read_unlock();
+
     qht_statistics_init(&ht, &stats);
     if (stats.used_head_buckets) {
         g_assert_cmpfloat(qdist_avg(&stats.chain), >=, 1.0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] qht fixes
  2016-10-05 22:34     ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
                         ` (2 preceding siblings ...)
  2016-10-05 22:34       ` [Qemu-devel] [PATCH 3/3] test-qht: perform lookups under rcu_read_lock Emilio G. Cota
@ 2016-10-06  8:31       ` Dr. David Alan Gilbert
  2016-10-06 10:56       ` Paolo Bonzini
  4 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2016-10-06  8:31 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: Peter Maydell, Alex Bennée, Paolo Bonzini, QEMU Developers,
	Richard Henderson

* Emilio G. Cota (cota@braap.org) wrote:
> Patch 1 fixes a warning that gcc may unnecessarily emit.
> 
> Patch 2 fixes a real bug that sometimes shows up as a segfault in test-qht.
> Daniel reported it yesterday on IRC; the trick to easily trigger it is to
> run on RHEL6 (or CentOS6).
> It is very likely that this is the test-qht failure that Peter reported here:
>   https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03771.html
> (we cannot be 100% sure due to the lack of output there; however the fact that
> it's the resize what fails supports the hypothesis that we're indeed hitting the
> same bug.)
> I'm therefore adding Peter's reported-by tag to the patch along with Daniel's.
> 
> Patch 3 is merely good practice, since test-qht is single-threaded. However
> I like having it, since test-qht serves as a de facto usage example
> of qht.
> 
> Given that patch 2 alone would conflict without previously applying patch 1,
> I propose to merge both patches 1 & 2 to 2.7-stable. I'll send them to
> qemu-stable once the patchset is picked up to be merged onto master.

That seems to fix the tests-qht failure I was seeing on RHEL6.

Dave

> Thanks,
> 
> 		Emilio
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] qht fixes
  2016-10-05 22:34     ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
                         ` (3 preceding siblings ...)
  2016-10-06  8:31       ` [Qemu-devel] [PATCH 0/3] qht fixes Dr. David Alan Gilbert
@ 2016-10-06 10:56       ` Paolo Bonzini
  4 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2016-10-06 10:56 UTC (permalink / raw)
  To: Emilio G. Cota, Peter Maydell
  Cc: Daniel P . Berrange, Alex Bennée, Richard Henderson,
	QEMU Developers



On 06/10/2016 00:34, Emilio G. Cota wrote:
> Patch 1 fixes a warning that gcc may unnecessarily emit.
> 
> Patch 2 fixes a real bug that sometimes shows up as a segfault in test-qht.
> Daniel reported it yesterday on IRC; the trick to easily trigger it is to
> run on RHEL6 (or CentOS6).
> It is very likely that this is the test-qht failure that Peter reported here:
>   https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03771.html
> (we cannot be 100% sure due to the lack of output there; however the fact that
> it's the resize what fails supports the hypothesis that we're indeed hitting the
> same bug.)
> I'm therefore adding Peter's reported-by tag to the patch along with Daniel's.
> 
> Patch 3 is merely good practice, since test-qht is single-threaded. However
> I like having it, since test-qht serves as a de facto usage example
> of qht.
> 
> Given that patch 2 alone would conflict without previously applying patch 1,
> I propose to merge both patches 1 & 2 to 2.7-stable. I'll send them to
> qemu-stable once the patchset is picked up to be merged onto master.
> 
> Thanks,
> 
> 		Emilio
> 

Queued, thanks.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-10-06 10:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-24 20:39 [Qemu-devel] transient failure in the test-qht tests Peter Maydell
2016-08-24 23:44 ` Emilio G. Cota
2016-08-24 23:52   ` Peter Maydell
2016-10-05 22:34     ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
2016-10-05 22:34       ` [Qemu-devel] [PATCH 1/3] qht: simplify qht_reset_size Emilio G. Cota
2016-10-05 22:34       ` [Qemu-devel] [PATCH 2/3] qht: fix unlock-after-free segfault upon resizing Emilio G. Cota
2016-10-05 22:34       ` [Qemu-devel] [PATCH 3/3] test-qht: perform lookups under rcu_read_lock Emilio G. Cota
2016-10-06  8:31       ` [Qemu-devel] [PATCH 0/3] qht fixes Dr. David Alan Gilbert
2016-10-06 10:56       ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).