* [Qemu-devel] transient failure in the test-qht tests
@ 2016-08-24 20:39 Peter Maydell
2016-08-24 23:44 ` Emilio G. Cota
0 siblings, 1 reply; 9+ messages in thread
From: Peter Maydell @ 2016-08-24 20:39 UTC (permalink / raw)
To: QEMU Developers, Emilio G. Cota
So I encountered this test failure running 'make check' on
32-bit ARM:
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} gtester -k
--verbose -m=quick tests/test-qht
TEST: tests/test-qht... (pid=15763)
/qht/mode/default: OK
/qht/mode/resize: FAIL
GTester: last random seed: R02S08efd89fe4d862dd0191c13d5ce4d76e
(pid=16462)
FAIL: tests/test-qht
The test suite passed on a rerun.
Any ideas?
thanks
-- PMM
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] transient failure in the test-qht tests
2016-08-24 20:39 [Qemu-devel] transient failure in the test-qht tests Peter Maydell
@ 2016-08-24 23:44 ` Emilio G. Cota
2016-08-24 23:52 ` Peter Maydell
0 siblings, 1 reply; 9+ messages in thread
From: Emilio G. Cota @ 2016-08-24 23:44 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers
On Wed, Aug 24, 2016 at 21:39:01 +0100, Peter Maydell wrote:
> So I encountered this test failure running 'make check' on
> 32-bit ARM:
>
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} gtester -k
> --verbose -m=quick tests/test-qht
> TEST: tests/test-qht... (pid=15763)
> /qht/mode/default: OK
> /qht/mode/resize: FAIL
> GTester: last random seed: R02S08efd89fe4d862dd0191c13d5ce4d76e
> (pid=16462)
> FAIL: tests/test-qht
>
> The test suite passed on a rerun.
>
> Any ideas?
I wonder whether malloc perturb had to do with the failure, because
-ENOMEM is unlikely (I only see a few MB of peak mem usage for qht-test)
However, I just ran qht-test under valgrind on an i686 machine, and it comes
clean.
I also brute-forced this to see if a particular perturb value would
make it fail:
for i in $(seq 0 255); do \
echo $i && \
MALLOC_PERTURB_=$i gtester -k --verbose -m=quick tests/test-qht \
--seed=R02S08efd89fe4d862dd0191c13d5ce4d76e || break; \
done
I get no failures on both i686 and x86_64, with and without that --seed flag.
Is there any chance of getting a core dump for the failure you encountered?
Thanks,
Emilio
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] transient failure in the test-qht tests
2016-08-24 23:44 ` Emilio G. Cota
@ 2016-08-24 23:52 ` Peter Maydell
2016-10-05 22:34 ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
0 siblings, 1 reply; 9+ messages in thread
From: Peter Maydell @ 2016-08-24 23:52 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: QEMU Developers
On 25 August 2016 at 00:44, Emilio G. Cota <cota@braap.org> wrote:
> On Wed, Aug 24, 2016 at 21:39:01 +0100, Peter Maydell wrote:
>> So I encountered this test failure running 'make check' on
>> 32-bit ARM:
>>
>> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} gtester -k
>> --verbose -m=quick tests/test-qht
>> TEST: tests/test-qht... (pid=15763)
>> /qht/mode/default: OK
>> /qht/mode/resize: FAIL
>> GTester: last random seed: R02S08efd89fe4d862dd0191c13d5ce4d76e
>> (pid=16462)
>> FAIL: tests/test-qht
>>
>> The test suite passed on a rerun.
>>
>> Any ideas?
>
> I wonder whether malloc perturb had to do with the failure, because
> -ENOMEM is unlikely (I only see a few MB of peak mem usage for qht-test)
>
> However, I just ran qht-test under valgrind on an i686 machine, and it comes
> clean.
>
> I also brute-forced this to see if a particular perturb value would
> make it fail:
> for i in $(seq 0 255); do \
> echo $i && \
> MALLOC_PERTURB_=$i gtester -k --verbose -m=quick tests/test-qht \
> --seed=R02S08efd89fe4d862dd0191c13d5ce4d76e || break; \
> done
>
> I get no failures on both i686 and x86_64, with and without that --seed flag.
>
> Is there any chance of getting a core dump for the failure you encountered?
Unfortunately not, the test config doesn't save core dumps. In any case
I assume from the output that the test didn't actually dump core, it
just failed (without saying anything about why).
thanks
-- PMM
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 0/3] qht fixes
2016-08-24 23:52 ` Peter Maydell
@ 2016-10-05 22:34 ` Emilio G. Cota
2016-10-05 22:34 ` [Qemu-devel] [PATCH 1/3] qht: simplify qht_reset_size Emilio G. Cota
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Emilio G. Cota @ 2016-10-05 22:34 UTC (permalink / raw)
To: Peter Maydell
Cc: Daniel P . Berrange, Paolo Bonzini, Alex Bennée,
Richard Henderson, QEMU Developers
Patch 1 fixes a warning that gcc may unnecessarily emit.
Patch 2 fixes a real bug that sometimes shows up as a segfault in test-qht.
Daniel reported it yesterday on IRC; the trick to easily trigger it is to
run on RHEL6 (or CentOS6).
It is very likely that this is the test-qht failure that Peter reported here:
https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03771.html
(we cannot be 100% sure due to the lack of output there; however the fact that
it's the resize what fails supports the hypothesis that we're indeed hitting the
same bug.)
I'm therefore adding Peter's reported-by tag to the patch along with Daniel's.
Patch 3 is merely good practice, since test-qht is single-threaded. However
I like having it, since test-qht serves as a de facto usage example
of qht.
Given that patch 2 alone would conflict without previously applying patch 1,
I propose to merge both patches 1 & 2 to 2.7-stable. I'll send them to
qemu-stable once the patchset is picked up to be merged onto master.
Thanks,
Emilio
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 1/3] qht: simplify qht_reset_size
2016-10-05 22:34 ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
@ 2016-10-05 22:34 ` Emilio G. Cota
2016-10-05 22:34 ` [Qemu-devel] [PATCH 2/3] qht: fix unlock-after-free segfault upon resizing Emilio G. Cota
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Emilio G. Cota @ 2016-10-05 22:34 UTC (permalink / raw)
To: Peter Maydell
Cc: Daniel P . Berrange, Paolo Bonzini, Alex Bennée,
Richard Henderson, QEMU Developers
Sometimes gcc doesn't pick up the fact that 'new' is properly
set if 'resize == true', which may generate an unnecessary
build warning.
Fix it by removing 'resize' and directly checking that 'new'
is non-NULL.
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
util/qht.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/util/qht.c b/util/qht.c
index 16a8d79..af8da3c 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -410,10 +410,9 @@ void qht_reset(struct qht *ht)
bool qht_reset_size(struct qht *ht, size_t n_elems)
{
- struct qht_map *new;
+ struct qht_map *new = NULL;
struct qht_map *map;
size_t n_buckets;
- bool resize = false;
n_buckets = qht_elems_to_buckets(n_elems);
@@ -421,18 +420,17 @@ bool qht_reset_size(struct qht *ht, size_t n_elems)
map = ht->map;
if (n_buckets != map->n_buckets) {
new = qht_map_create(n_buckets);
- resize = true;
}
qht_map_lock_buckets(map);
qht_map_reset__all_locked(map);
- if (resize) {
+ if (new) {
qht_do_resize(ht, new);
}
qht_map_unlock_buckets(map);
qemu_mutex_unlock(&ht->lock);
- return resize;
+ return !!new;
}
static inline
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 2/3] qht: fix unlock-after-free segfault upon resizing
2016-10-05 22:34 ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
2016-10-05 22:34 ` [Qemu-devel] [PATCH 1/3] qht: simplify qht_reset_size Emilio G. Cota
@ 2016-10-05 22:34 ` Emilio G. Cota
2016-10-05 22:34 ` [Qemu-devel] [PATCH 3/3] test-qht: perform lookups under rcu_read_lock Emilio G. Cota
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Emilio G. Cota @ 2016-10-05 22:34 UTC (permalink / raw)
To: Peter Maydell
Cc: Daniel P . Berrange, Paolo Bonzini, Alex Bennée,
Richard Henderson, QEMU Developers
The old map's bucket locks are being unlocked *after*
that same old map has been passed to RCU for destruction.
This is a bug that can cause a segfault, since there's
no guarantee that the deletion will be deferred (e.g.
there may be no concurrent readers).
The segfault is easily triggered in RHEL6/CentOS6 with qht-test,
particularly on a single-core system or by pinning qht-test
to a single core.
Fix it by unlocking the map's bucket locks right after having
published the new map, and (crucially) before marking the map
for deletion via call_rcu().
While at it, expand qht_do_resize() to atomically do (1) a reset,
(2) a resize, or (3) a reset+resize. This simplifies the calling
code, since the new function (qht_do_resize_reset()) acquires
and releases the buckets' locks.
Note that no qht_do_reset inline is provided, since it would have
no users--qht_reset() already performs a reset without taking
ht->lock.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reported-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
util/qht.c | 49 ++++++++++++++++++++++++++++---------------------
1 file changed, 28 insertions(+), 21 deletions(-)
diff --git a/util/qht.c b/util/qht.c
index af8da3c..6c61aca 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -133,7 +133,8 @@ struct qht_map {
/* trigger a resize when n_added_buckets > n_buckets / div */
#define QHT_NR_ADDED_BUCKETS_THRESHOLD_DIV 8
-static void qht_do_resize(struct qht *ht, struct qht_map *new);
+static void qht_do_resize_reset(struct qht *ht, struct qht_map *new,
+ bool reset);
static void qht_grow_maybe(struct qht *ht);
#ifdef QHT_DEBUG
@@ -408,6 +409,16 @@ void qht_reset(struct qht *ht)
qht_map_unlock_buckets(map);
}
+static inline void qht_do_resize(struct qht *ht, struct qht_map *new)
+{
+ qht_do_resize_reset(ht, new, false);
+}
+
+static inline void qht_do_resize_and_reset(struct qht *ht, struct qht_map *new)
+{
+ qht_do_resize_reset(ht, new, true);
+}
+
bool qht_reset_size(struct qht *ht, size_t n_elems)
{
struct qht_map *new = NULL;
@@ -421,13 +432,7 @@ bool qht_reset_size(struct qht *ht, size_t n_elems)
if (n_buckets != map->n_buckets) {
new = qht_map_create(n_buckets);
}
-
- qht_map_lock_buckets(map);
- qht_map_reset__all_locked(map);
- if (new) {
- qht_do_resize(ht, new);
- }
- qht_map_unlock_buckets(map);
+ qht_do_resize_and_reset(ht, new);
qemu_mutex_unlock(&ht->lock);
return !!new;
@@ -559,9 +564,7 @@ static __attribute__((noinline)) void qht_grow_maybe(struct qht *ht)
if (qht_map_needs_resize(map)) {
struct qht_map *new = qht_map_create(map->n_buckets * 2);
- qht_map_lock_buckets(map);
qht_do_resize(ht, new);
- qht_map_unlock_buckets(map);
}
qemu_mutex_unlock(&ht->lock);
}
@@ -737,24 +740,31 @@ static void qht_map_copy(struct qht *ht, void *p, uint32_t hash, void *userp)
}
/*
- * Call with ht->lock and all bucket locks held.
- *
- * Creating the @new map here would add unnecessary delay while all the locks
- * are held--holding up the bucket locks is particularly bad, since no writes
- * can occur while these are held. Thus, we let callers create the new map,
- * hopefully without the bucket locks held.
+ * Atomically perform a resize and/or reset.
+ * Call with ht->lock held.
*/
-static void qht_do_resize(struct qht *ht, struct qht_map *new)
+static void qht_do_resize_reset(struct qht *ht, struct qht_map *new, bool reset)
{
struct qht_map *old;
old = ht->map;
- g_assert_cmpuint(new->n_buckets, !=, old->n_buckets);
+ qht_map_lock_buckets(old);
+ if (reset) {
+ qht_map_reset__all_locked(old);
+ }
+
+ if (new == NULL) {
+ qht_map_unlock_buckets(old);
+ return;
+ }
+
+ g_assert_cmpuint(new->n_buckets, !=, old->n_buckets);
qht_map_iter__all_locked(ht, old, qht_map_copy, new);
qht_map_debug__all_locked(new);
atomic_rcu_set(&ht->map, new);
+ qht_map_unlock_buckets(old);
call_rcu(old, qht_map_destroy, rcu);
}
@@ -766,12 +776,9 @@ bool qht_resize(struct qht *ht, size_t n_elems)
qemu_mutex_lock(&ht->lock);
if (n_buckets != ht->map->n_buckets) {
struct qht_map *new;
- struct qht_map *old = ht->map;
new = qht_map_create(n_buckets);
- qht_map_lock_buckets(old);
qht_do_resize(ht, new);
- qht_map_unlock_buckets(old);
ret = true;
}
qemu_mutex_unlock(&ht->lock);
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 3/3] test-qht: perform lookups under rcu_read_lock
2016-10-05 22:34 ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
2016-10-05 22:34 ` [Qemu-devel] [PATCH 1/3] qht: simplify qht_reset_size Emilio G. Cota
2016-10-05 22:34 ` [Qemu-devel] [PATCH 2/3] qht: fix unlock-after-free segfault upon resizing Emilio G. Cota
@ 2016-10-05 22:34 ` Emilio G. Cota
2016-10-06 8:31 ` [Qemu-devel] [PATCH 0/3] qht fixes Dr. David Alan Gilbert
2016-10-06 10:56 ` Paolo Bonzini
4 siblings, 0 replies; 9+ messages in thread
From: Emilio G. Cota @ 2016-10-05 22:34 UTC (permalink / raw)
To: Peter Maydell
Cc: Daniel P . Berrange, Paolo Bonzini, Alex Bennée,
Richard Henderson, QEMU Developers
qht_lookup is meant to be called from an RCU read-critical
section. Make sure we're in such a section in test-qht
when performing lookups, despite the fact that no races
in qht can be triggered by test-qht since it is single-threaded.
Note that rcu_register_thread is already called by the
rcu_after_fork hook, and therefore duplicating it here would
be a bug.
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
tests/test-qht.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tests/test-qht.c b/tests/test-qht.c
index 46a64b6..9b7423a 100644
--- a/tests/test-qht.c
+++ b/tests/test-qht.c
@@ -6,6 +6,7 @@
*/
#include "qemu/osdep.h"
#include "qemu/qht.h"
+#include "qemu/rcu.h"
#define N 5000
@@ -51,6 +52,7 @@ static void check(int a, int b, bool expected)
struct qht_stats stats;
int i;
+ rcu_read_lock();
for (i = a; i < b; i++) {
void *p;
uint32_t hash;
@@ -61,6 +63,8 @@ static void check(int a, int b, bool expected)
p = qht_lookup(&ht, is_equal, &val, hash);
g_assert_true(!!p == expected);
}
+ rcu_read_unlock();
+
qht_statistics_init(&ht, &stats);
if (stats.used_head_buckets) {
g_assert_cmpfloat(qdist_avg(&stats.chain), >=, 1.0);
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] qht fixes
2016-10-05 22:34 ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
` (2 preceding siblings ...)
2016-10-05 22:34 ` [Qemu-devel] [PATCH 3/3] test-qht: perform lookups under rcu_read_lock Emilio G. Cota
@ 2016-10-06 8:31 ` Dr. David Alan Gilbert
2016-10-06 10:56 ` Paolo Bonzini
4 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2016-10-06 8:31 UTC (permalink / raw)
To: Emilio G. Cota
Cc: Peter Maydell, Alex Bennée, Paolo Bonzini, QEMU Developers,
Richard Henderson
* Emilio G. Cota (cota@braap.org) wrote:
> Patch 1 fixes a warning that gcc may unnecessarily emit.
>
> Patch 2 fixes a real bug that sometimes shows up as a segfault in test-qht.
> Daniel reported it yesterday on IRC; the trick to easily trigger it is to
> run on RHEL6 (or CentOS6).
> It is very likely that this is the test-qht failure that Peter reported here:
> https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03771.html
> (we cannot be 100% sure due to the lack of output there; however the fact that
> it's the resize what fails supports the hypothesis that we're indeed hitting the
> same bug.)
> I'm therefore adding Peter's reported-by tag to the patch along with Daniel's.
>
> Patch 3 is merely good practice, since test-qht is single-threaded. However
> I like having it, since test-qht serves as a de facto usage example
> of qht.
>
> Given that patch 2 alone would conflict without previously applying patch 1,
> I propose to merge both patches 1 & 2 to 2.7-stable. I'll send them to
> qemu-stable once the patchset is picked up to be merged onto master.
That seems to fix the tests-qht failure I was seeing on RHEL6.
Dave
> Thanks,
>
> Emilio
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] qht fixes
2016-10-05 22:34 ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
` (3 preceding siblings ...)
2016-10-06 8:31 ` [Qemu-devel] [PATCH 0/3] qht fixes Dr. David Alan Gilbert
@ 2016-10-06 10:56 ` Paolo Bonzini
4 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2016-10-06 10:56 UTC (permalink / raw)
To: Emilio G. Cota, Peter Maydell
Cc: Daniel P . Berrange, Alex Bennée, Richard Henderson,
QEMU Developers
On 06/10/2016 00:34, Emilio G. Cota wrote:
> Patch 1 fixes a warning that gcc may unnecessarily emit.
>
> Patch 2 fixes a real bug that sometimes shows up as a segfault in test-qht.
> Daniel reported it yesterday on IRC; the trick to easily trigger it is to
> run on RHEL6 (or CentOS6).
> It is very likely that this is the test-qht failure that Peter reported here:
> https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03771.html
> (we cannot be 100% sure due to the lack of output there; however the fact that
> it's the resize what fails supports the hypothesis that we're indeed hitting the
> same bug.)
> I'm therefore adding Peter's reported-by tag to the patch along with Daniel's.
>
> Patch 3 is merely good practice, since test-qht is single-threaded. However
> I like having it, since test-qht serves as a de facto usage example
> of qht.
>
> Given that patch 2 alone would conflict without previously applying patch 1,
> I propose to merge both patches 1 & 2 to 2.7-stable. I'll send them to
> qemu-stable once the patchset is picked up to be merged onto master.
>
> Thanks,
>
> Emilio
>
Queued, thanks.
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-10-06 10:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-24 20:39 [Qemu-devel] transient failure in the test-qht tests Peter Maydell
2016-08-24 23:44 ` Emilio G. Cota
2016-08-24 23:52 ` Peter Maydell
2016-10-05 22:34 ` [Qemu-devel] [PATCH 0/3] qht fixes Emilio G. Cota
2016-10-05 22:34 ` [Qemu-devel] [PATCH 1/3] qht: simplify qht_reset_size Emilio G. Cota
2016-10-05 22:34 ` [Qemu-devel] [PATCH 2/3] qht: fix unlock-after-free segfault upon resizing Emilio G. Cota
2016-10-05 22:34 ` [Qemu-devel] [PATCH 3/3] test-qht: perform lookups under rcu_read_lock Emilio G. Cota
2016-10-06 8:31 ` [Qemu-devel] [PATCH 0/3] qht fixes Dr. David Alan Gilbert
2016-10-06 10:56 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).