* [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race
@ 2015-03-10 15:45 Paolo Bonzini
2015-03-10 22:51 ` Christian Borntraeger
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Paolo Bonzini @ 2015-03-10 15:45 UTC (permalink / raw)
To: qemu-devel; +Cc: borntraeger, stefanha
There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.
Because atomic_cmpxchg returns the old value instead of a success flag,
QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
the second argument to atomic_cmpxchg. Unfortunately, this only works
if the second argument is a local or thread-local variable.
If it is in memory, it can be subject to common subexpression elimination
(and then everything's fine) or reloaded after the atomic_cmpxchg,
depending on the compiler's whims. If the latter happens, the race can
happen. A thread can sneak in, doing something on elm->field.sle_next
after the atomic_cmpxchg and before the comparison. This causes a wrong
failure, and then two threads are using "elm" at the same time. In the
case discovered by Christian, the sequence was likely something like this:
thread 1 | thread 2
QSLIST_INSERT_HEAD_ATOMIC |
atomic_cmpxchg succeeds |
elm added to list |
| steal release_pool
| QSLIST_REMOVE_HEAD
| elm removed from list
| ...
| QSLIST_INSERT_HEAD_ATOMIC
| (overwrites sle_next)
spurious failure |
atomic_cmpxchg succeeds |
elm added to list again |
|
steal release_pool |
QSLIST_REMOVE_HEAD |
elm removed again |
The last three steps could be done by a third thread as well.
A reproducer that failed in a matter of seconds is as follows:
- the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
memory was 16G just to err on the safe side (the host has 64G, but hey
at least you need no s390)
- the guest has 24 null-aio virtio-blk devices using dataplane
(-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
-device virtio-blk-pci,iothread=ioN,drive=blkN)
- the guest also has a single network interface. It's only doing loopback
tests so slirp vs. tap and the model doesn't matter.
- the guest is running fio with the following script:
[global]
rw=randread
blocksize=16k
ioengine=libaio
runtime=10m
buffered=0
fallocate=none
time_based
iodepth=32
[virtio1a]
filename=/dev/block/252\:16
[virtio1b]
filename=/dev/block/252\:16
...
[virtio24a]
filename=/dev/block/252\:384
[virtio24b]
filename=/dev/block/252\:384
[listen1]
protocol=tcp
ioengine=net
port=12345
listen
rw=read
bs=4k
size=1000g
[connect1]
protocol=tcp
hostname=localhost
ioengine=net
port=12345
protocol=tcp
rw=write
startdelay=1
size=1000g
...
[listen8]
protocol=tcp
ioengine=net
port=12352
listen
rw=read
bs=4k
size=1000g
[connect8]
protocol=tcp
hostname=localhost
ioengine=net
port=12352
rw=write
startdelay=1
size=1000g
Moral of the story: I should refrain from writing more clever stuff.
At least it looks like it is not too clever to be undebuggable.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qemu/queue.h | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 8094150..f781aa2 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -197,11 +197,12 @@ struct { \
(head)->slh_first = (elm); \
} while (/*CONSTCOND*/0)
-#define QSLIST_INSERT_HEAD_ATOMIC(head, elm, field) do { \
- do { \
- (elm)->field.sle_next = (head)->slh_first; \
- } while (atomic_cmpxchg(&(head)->slh_first, (elm)->field.sle_next, \
- (elm)) != (elm)->field.sle_next); \
+#define QSLIST_INSERT_HEAD_ATOMIC(head, elm, field) do { \
+ typeof(elm) save_sle_next; \
+ do { \
+ save_sle_next = (elm)->field.sle_next = (head)->slh_first; \
+ } while (atomic_cmpxchg(&(head)->slh_first, save_sle_next, (elm)) != \
+ save_sle_next); \
} while (/*CONSTCOND*/0)
#define QSLIST_MOVE_ATOMIC(dest, src) do { \
--
2.3.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race
2015-03-10 15:45 [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race Paolo Bonzini
@ 2015-03-10 22:51 ` Christian Borntraeger
2015-03-12 13:34 ` Stefan Hajnoczi
2015-03-12 13:35 ` Stefan Hajnoczi
2 siblings, 0 replies; 4+ messages in thread
From: Christian Borntraeger @ 2015-03-10 22:51 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel; +Cc: stefanha
Am 10.03.2015 um 16:45 schrieb Paolo Bonzini:
[...]
>
> Moral of the story: I should refrain from writing more clever stuff.
> At least it looks like it is not too clever to be undebuggable.
>
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
> include/qemu/queue.h | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/include/qemu/queue.h b/include/qemu/queue.h
> index 8094150..f781aa2 100644
> --- a/include/qemu/queue.h
> +++ b/include/qemu/queue.h
> @@ -197,11 +197,12 @@ struct { \
> (head)->slh_first = (elm); \
> } while (/*CONSTCOND*/0)
>
> -#define QSLIST_INSERT_HEAD_ATOMIC(head, elm, field) do { \
> - do { \
> - (elm)->field.sle_next = (head)->slh_first; \
> - } while (atomic_cmpxchg(&(head)->slh_first, (elm)->field.sle_next, \
> - (elm)) != (elm)->field.sle_next); \
> +#define QSLIST_INSERT_HEAD_ATOMIC(head, elm, field) do { \
> + typeof(elm) save_sle_next; \
> + do { \
> + save_sle_next = (elm)->field.sle_next = (head)->slh_first; \
> + } while (atomic_cmpxchg(&(head)->slh_first, save_sle_next, (elm)) != \
> + save_sle_next); \
> } while (/*CONSTCOND*/0)
>
> #define QSLIST_MOVE_ATOMIC(dest, src) do { \
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race
2015-03-10 15:45 [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race Paolo Bonzini
2015-03-10 22:51 ` Christian Borntraeger
@ 2015-03-12 13:34 ` Stefan Hajnoczi
2015-03-12 13:35 ` Stefan Hajnoczi
2 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2015-03-12 13:34 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: borntraeger, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 3992 bytes --]
On Tue, Mar 10, 2015 at 04:45:57PM +0100, Paolo Bonzini wrote:
> There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.
>
> Because atomic_cmpxchg returns the old value instead of a success flag,
> QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
> the second argument to atomic_cmpxchg. Unfortunately, this only works
> if the second argument is a local or thread-local variable.
>
> If it is in memory, it can be subject to common subexpression elimination
> (and then everything's fine) or reloaded after the atomic_cmpxchg,
> depending on the compiler's whims. If the latter happens, the race can
> happen. A thread can sneak in, doing something on elm->field.sle_next
> after the atomic_cmpxchg and before the comparison. This causes a wrong
> failure, and then two threads are using "elm" at the same time. In the
> case discovered by Christian, the sequence was likely something like this:
>
> thread 1 | thread 2
> QSLIST_INSERT_HEAD_ATOMIC |
> atomic_cmpxchg succeeds |
> elm added to list |
> | steal release_pool
> | QSLIST_REMOVE_HEAD
> | elm removed from list
> | ...
> | QSLIST_INSERT_HEAD_ATOMIC
> | (overwrites sle_next)
> spurious failure |
> atomic_cmpxchg succeeds |
> elm added to list again |
> |
> steal release_pool |
> QSLIST_REMOVE_HEAD |
> elm removed again |
>
> The last three steps could be done by a third thread as well.
> A reproducer that failed in a matter of seconds is as follows:
>
> - the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
> memory was 16G just to err on the safe side (the host has 64G, but hey
> at least you need no s390)
>
> - the guest has 24 null-aio virtio-blk devices using dataplane
> (-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
> -device virtio-blk-pci,iothread=ioN,drive=blkN)
>
> - the guest also has a single network interface. It's only doing loopback
> tests so slirp vs. tap and the model doesn't matter.
>
> - the guest is running fio with the following script:
>
> [global]
> rw=randread
> blocksize=16k
> ioengine=libaio
> runtime=10m
> buffered=0
> fallocate=none
> time_based
> iodepth=32
>
> [virtio1a]
> filename=/dev/block/252\:16
>
> [virtio1b]
> filename=/dev/block/252\:16
>
> ...
>
> [virtio24a]
> filename=/dev/block/252\:384
>
> [virtio24b]
> filename=/dev/block/252\:384
>
> [listen1]
> protocol=tcp
> ioengine=net
> port=12345
> listen
> rw=read
> bs=4k
> size=1000g
>
> [connect1]
> protocol=tcp
> hostname=localhost
> ioengine=net
> port=12345
> protocol=tcp
> rw=write
> startdelay=1
> size=1000g
>
> ...
>
> [listen8]
> protocol=tcp
> ioengine=net
> port=12352
> listen
> rw=read
> bs=4k
> size=1000g
>
> [connect8]
> protocol=tcp
> hostname=localhost
> ioengine=net
> port=12352
> rw=write
> startdelay=1
> size=1000g
>
> Moral of the story: I should refrain from writing more clever stuff.
> At least it looks like it is not too clever to be undebuggable.
>
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> include/qemu/queue.h | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race
2015-03-10 15:45 [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race Paolo Bonzini
2015-03-10 22:51 ` Christian Borntraeger
2015-03-12 13:34 ` Stefan Hajnoczi
@ 2015-03-12 13:35 ` Stefan Hajnoczi
2 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2015-03-12 13:35 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: borntraeger, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 4046 bytes --]
On Tue, Mar 10, 2015 at 04:45:57PM +0100, Paolo Bonzini wrote:
> There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.
>
> Because atomic_cmpxchg returns the old value instead of a success flag,
> QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
> the second argument to atomic_cmpxchg. Unfortunately, this only works
> if the second argument is a local or thread-local variable.
>
> If it is in memory, it can be subject to common subexpression elimination
> (and then everything's fine) or reloaded after the atomic_cmpxchg,
> depending on the compiler's whims. If the latter happens, the race can
> happen. A thread can sneak in, doing something on elm->field.sle_next
> after the atomic_cmpxchg and before the comparison. This causes a wrong
> failure, and then two threads are using "elm" at the same time. In the
> case discovered by Christian, the sequence was likely something like this:
>
> thread 1 | thread 2
> QSLIST_INSERT_HEAD_ATOMIC |
> atomic_cmpxchg succeeds |
> elm added to list |
> | steal release_pool
> | QSLIST_REMOVE_HEAD
> | elm removed from list
> | ...
> | QSLIST_INSERT_HEAD_ATOMIC
> | (overwrites sle_next)
> spurious failure |
> atomic_cmpxchg succeeds |
> elm added to list again |
> |
> steal release_pool |
> QSLIST_REMOVE_HEAD |
> elm removed again |
>
> The last three steps could be done by a third thread as well.
> A reproducer that failed in a matter of seconds is as follows:
>
> - the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
> memory was 16G just to err on the safe side (the host has 64G, but hey
> at least you need no s390)
>
> - the guest has 24 null-aio virtio-blk devices using dataplane
> (-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
> -device virtio-blk-pci,iothread=ioN,drive=blkN)
>
> - the guest also has a single network interface. It's only doing loopback
> tests so slirp vs. tap and the model doesn't matter.
>
> - the guest is running fio with the following script:
>
> [global]
> rw=randread
> blocksize=16k
> ioengine=libaio
> runtime=10m
> buffered=0
> fallocate=none
> time_based
> iodepth=32
>
> [virtio1a]
> filename=/dev/block/252\:16
>
> [virtio1b]
> filename=/dev/block/252\:16
>
> ...
>
> [virtio24a]
> filename=/dev/block/252\:384
>
> [virtio24b]
> filename=/dev/block/252\:384
>
> [listen1]
> protocol=tcp
> ioengine=net
> port=12345
> listen
> rw=read
> bs=4k
> size=1000g
>
> [connect1]
> protocol=tcp
> hostname=localhost
> ioengine=net
> port=12345
> protocol=tcp
> rw=write
> startdelay=1
> size=1000g
>
> ...
>
> [listen8]
> protocol=tcp
> ioengine=net
> port=12352
> listen
> rw=read
> bs=4k
> size=1000g
>
> [connect8]
> protocol=tcp
> hostname=localhost
> ioengine=net
> port=12352
> rw=write
> startdelay=1
> size=1000g
>
> Moral of the story: I should refrain from writing more clever stuff.
> At least it looks like it is not too clever to be undebuggable.
>
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> include/qemu/queue.h | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
Thanks, applied to my block tree for QEMU 2.3:
https://github.com/stefanha/qemu/commits/block
Stefan
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-03-12 13:35 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-10 15:45 [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race Paolo Bonzini
2015-03-10 22:51 ` Christian Borntraeger
2015-03-12 13:34 ` Stefan Hajnoczi
2015-03-12 13:35 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).