[Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race
@ 2015-03-10 15:45 Paolo Bonzini
  2015-03-10 22:51 ` Christian Borntraeger
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Paolo Bonzini @ 2015-03-10 15:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: borntraeger, stefanha

There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.

Because atomic_cmpxchg returns the old value instead of a success flag,
QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
the second argument to atomic_cmpxchg.  Unfortunately, this only works
if the second argument is a local or thread-local variable.

If it is in memory, it can be subject to common subexpression elimination
(and then everything's fine) or reloaded after the atomic_cmpxchg,
depending on the compiler's whims.  If the latter happens, the race can
happen.  A thread can sneak in, doing something on elm->field.sle_next
after the atomic_cmpxchg and before the comparison.  This causes a wrong
failure, and then two threads are using "elm" at the same time.  In the
case discovered by Christian, the sequence was likely something like this:

    thread 1                   | thread 2
    QSLIST_INSERT_HEAD_ATOMIC  |
      atomic_cmpxchg succeeds  |
      elm added to list        |
                               | steal release_pool
                               | QSLIST_REMOVE_HEAD
                               | elm removed from list
                               | ...
                               | QSLIST_INSERT_HEAD_ATOMIC
                               |   (overwrites sle_next)
      spurious failure         |
      atomic_cmpxchg succeeds  |
      elm added to list again  |
                               |
    steal release_pool         |
    QSLIST_REMOVE_HEAD         |
    elm removed again          |

The last three steps could be done by a third thread as well.
A reproducer that failed in a matter of seconds is as follows:

- the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
  memory was 16G just to err on the safe side (the host has 64G, but hey
  at least you need no s390)

- the guest has 24 null-aio virtio-blk devices using dataplane
  (-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
  -device virtio-blk-pci,iothread=ioN,drive=blkN)

- the guest also has a single network interface.  It's only doing loopback
  tests so slirp vs. tap and the model doesn't matter.

- the guest is running fio with the following script:

     [global]
     rw=randread
     blocksize=16k
     ioengine=libaio
     runtime=10m
     buffered=0
     fallocate=none
     time_based
     iodepth=32

     [virtio1a]
     filename=/dev/block/252\:16

     [virtio1b]
     filename=/dev/block/252\:16

     ...

     [virtio24a]
     filename=/dev/block/252\:384

     [virtio24b]
     filename=/dev/block/252\:384

     [listen1]
     protocol=tcp
     ioengine=net
     port=12345
     listen
     rw=read
     bs=4k
     size=1000g

     [connect1]
     protocol=tcp
     hostname=localhost
     ioengine=net
     port=12345
     protocol=tcp
     rw=write
     startdelay=1
     size=1000g

     ...

     [listen8]
     protocol=tcp
     ioengine=net
     port=12352
     listen
     rw=read
     bs=4k
     size=1000g

     [connect8]
     protocol=tcp
     hostname=localhost
     ioengine=net
     port=12352
     rw=write
     startdelay=1
     size=1000g

Moral of the story: I should refrain from writing more clever stuff.
At least it looks like it is not too clever to be undebuggable.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/queue.h | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 8094150..f781aa2 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -197,11 +197,12 @@ struct {                                                                \
         (head)->slh_first = (elm);                                       \
 } while (/*CONSTCOND*/0)
 
-#define QSLIST_INSERT_HEAD_ATOMIC(head, elm, field) do {                   \
-        do {                                                               \
-            (elm)->field.sle_next = (head)->slh_first;                     \
-        } while (atomic_cmpxchg(&(head)->slh_first, (elm)->field.sle_next, \
-                               (elm)) != (elm)->field.sle_next);           \
+#define QSLIST_INSERT_HEAD_ATOMIC(head, elm, field) do {                     \
+        typeof(elm) save_sle_next;                                           \
+        do {                                                                 \
+            save_sle_next = (elm)->field.sle_next = (head)->slh_first;       \
+        } while (atomic_cmpxchg(&(head)->slh_first, save_sle_next, (elm)) != \
+                 save_sle_next);                                             \
 } while (/*CONSTCOND*/0)
 
 #define QSLIST_MOVE_ATOMIC(dest, src) do {                               \
-- 
2.3.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race
  2015-03-10 15:45 [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race Paolo Bonzini
@ 2015-03-10 22:51 ` Christian Borntraeger
  2015-03-12 13:34 ` Stefan Hajnoczi
  2015-03-12 13:35 ` Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Christian Borntraeger @ 2015-03-10 22:51 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: stefanha

Am 10.03.2015 um 16:45 schrieb Paolo Bonzini:
[...]
> 
> Moral of the story: I should refrain from writing more clever stuff.
> At least it looks like it is not too clever to be undebuggable.
> 
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

> ---
>  include/qemu/queue.h | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/include/qemu/queue.h b/include/qemu/queue.h
> index 8094150..f781aa2 100644
> --- a/include/qemu/queue.h
> +++ b/include/qemu/queue.h
> @@ -197,11 +197,12 @@ struct {                                                                \
>          (head)->slh_first = (elm);                                       \
>  } while (/*CONSTCOND*/0)
> 
> -#define QSLIST_INSERT_HEAD_ATOMIC(head, elm, field) do {                   \
> -        do {                                                               \
> -            (elm)->field.sle_next = (head)->slh_first;                     \
> -        } while (atomic_cmpxchg(&(head)->slh_first, (elm)->field.sle_next, \
> -                               (elm)) != (elm)->field.sle_next);           \
> +#define QSLIST_INSERT_HEAD_ATOMIC(head, elm, field) do {                     \
> +        typeof(elm) save_sle_next;                                           \
> +        do {                                                                 \
> +            save_sle_next = (elm)->field.sle_next = (head)->slh_first;       \
> +        } while (atomic_cmpxchg(&(head)->slh_first, save_sle_next, (elm)) != \
> +                 save_sle_next);                                             \
>  } while (/*CONSTCOND*/0)
> 
>  #define QSLIST_MOVE_ATOMIC(dest, src) do {                               \
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race
  2015-03-10 15:45 [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race Paolo Bonzini
  2015-03-10 22:51 ` Christian Borntraeger
@ 2015-03-12 13:34 ` Stefan Hajnoczi
  2015-03-12 13:35 ` Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2015-03-12 13:34 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: borntraeger, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 3992 bytes --]

On Tue, Mar 10, 2015 at 04:45:57PM +0100, Paolo Bonzini wrote:
> There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.
> 
> Because atomic_cmpxchg returns the old value instead of a success flag,
> QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
> the second argument to atomic_cmpxchg.  Unfortunately, this only works
> if the second argument is a local or thread-local variable.
> 
> If it is in memory, it can be subject to common subexpression elimination
> (and then everything's fine) or reloaded after the atomic_cmpxchg,
> depending on the compiler's whims.  If the latter happens, the race can
> happen.  A thread can sneak in, doing something on elm->field.sle_next
> after the atomic_cmpxchg and before the comparison.  This causes a wrong
> failure, and then two threads are using "elm" at the same time.  In the
> case discovered by Christian, the sequence was likely something like this:
> 
>     thread 1                   | thread 2
>     QSLIST_INSERT_HEAD_ATOMIC  |
>       atomic_cmpxchg succeeds  |
>       elm added to list        |
>                                | steal release_pool
>                                | QSLIST_REMOVE_HEAD
>                                | elm removed from list
>                                | ...
>                                | QSLIST_INSERT_HEAD_ATOMIC
>                                |   (overwrites sle_next)
>       spurious failure         |
>       atomic_cmpxchg succeeds  |
>       elm added to list again  |
>                                |
>     steal release_pool         |
>     QSLIST_REMOVE_HEAD         |
>     elm removed again          |
> 
> The last three steps could be done by a third thread as well.
> A reproducer that failed in a matter of seconds is as follows:
> 
> - the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
>   memory was 16G just to err on the safe side (the host has 64G, but hey
>   at least you need no s390)
> 
> - the guest has 24 null-aio virtio-blk devices using dataplane
>   (-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
>   -device virtio-blk-pci,iothread=ioN,drive=blkN)
> 
> - the guest also has a single network interface.  It's only doing loopback
>   tests so slirp vs. tap and the model doesn't matter.
> 
> - the guest is running fio with the following script:
> 
>      [global]
>      rw=randread
>      blocksize=16k
>      ioengine=libaio
>      runtime=10m
>      buffered=0
>      fallocate=none
>      time_based
>      iodepth=32
> 
>      [virtio1a]
>      filename=/dev/block/252\:16
> 
>      [virtio1b]
>      filename=/dev/block/252\:16
> 
>      ...
> 
>      [virtio24a]
>      filename=/dev/block/252\:384
> 
>      [virtio24b]
>      filename=/dev/block/252\:384
> 
>      [listen1]
>      protocol=tcp
>      ioengine=net
>      port=12345
>      listen
>      rw=read
>      bs=4k
>      size=1000g
> 
>      [connect1]
>      protocol=tcp
>      hostname=localhost
>      ioengine=net
>      port=12345
>      protocol=tcp
>      rw=write
>      startdelay=1
>      size=1000g
> 
>      ...
> 
>      [listen8]
>      protocol=tcp
>      ioengine=net
>      port=12352
>      listen
>      rw=read
>      bs=4k
>      size=1000g
> 
>      [connect8]
>      protocol=tcp
>      hostname=localhost
>      ioengine=net
>      port=12352
>      rw=write
>      startdelay=1
>      size=1000g
> 
> Moral of the story: I should refrain from writing more clever stuff.
> At least it looks like it is not too clever to be undebuggable.
> 
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/qemu/queue.h | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race
  2015-03-10 15:45 [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race Paolo Bonzini
  2015-03-10 22:51 ` Christian Borntraeger
  2015-03-12 13:34 ` Stefan Hajnoczi
@ 2015-03-12 13:35 ` Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2015-03-12 13:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: borntraeger, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 4046 bytes --]

On Tue, Mar 10, 2015 at 04:45:57PM +0100, Paolo Bonzini wrote:
> There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.
> 
> Because atomic_cmpxchg returns the old value instead of a success flag,
> QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
> the second argument to atomic_cmpxchg.  Unfortunately, this only works
> if the second argument is a local or thread-local variable.
> 
> If it is in memory, it can be subject to common subexpression elimination
> (and then everything's fine) or reloaded after the atomic_cmpxchg,
> depending on the compiler's whims.  If the latter happens, the race can
> happen.  A thread can sneak in, doing something on elm->field.sle_next
> after the atomic_cmpxchg and before the comparison.  This causes a wrong
> failure, and then two threads are using "elm" at the same time.  In the
> case discovered by Christian, the sequence was likely something like this:
> 
>     thread 1                   | thread 2
>     QSLIST_INSERT_HEAD_ATOMIC  |
>       atomic_cmpxchg succeeds  |
>       elm added to list        |
>                                | steal release_pool
>                                | QSLIST_REMOVE_HEAD
>                                | elm removed from list
>                                | ...
>                                | QSLIST_INSERT_HEAD_ATOMIC
>                                |   (overwrites sle_next)
>       spurious failure         |
>       atomic_cmpxchg succeeds  |
>       elm added to list again  |
>                                |
>     steal release_pool         |
>     QSLIST_REMOVE_HEAD         |
>     elm removed again          |
> 
> The last three steps could be done by a third thread as well.
> A reproducer that failed in a matter of seconds is as follows:
> 
> - the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
>   memory was 16G just to err on the safe side (the host has 64G, but hey
>   at least you need no s390)
> 
> - the guest has 24 null-aio virtio-blk devices using dataplane
>   (-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
>   -device virtio-blk-pci,iothread=ioN,drive=blkN)
> 
> - the guest also has a single network interface.  It's only doing loopback
>   tests so slirp vs. tap and the model doesn't matter.
> 
> - the guest is running fio with the following script:
> 
>      [global]
>      rw=randread
>      blocksize=16k
>      ioengine=libaio
>      runtime=10m
>      buffered=0
>      fallocate=none
>      time_based
>      iodepth=32
> 
>      [virtio1a]
>      filename=/dev/block/252\:16
> 
>      [virtio1b]
>      filename=/dev/block/252\:16
> 
>      ...
> 
>      [virtio24a]
>      filename=/dev/block/252\:384
> 
>      [virtio24b]
>      filename=/dev/block/252\:384
> 
>      [listen1]
>      protocol=tcp
>      ioengine=net
>      port=12345
>      listen
>      rw=read
>      bs=4k
>      size=1000g
> 
>      [connect1]
>      protocol=tcp
>      hostname=localhost
>      ioengine=net
>      port=12345
>      protocol=tcp
>      rw=write
>      startdelay=1
>      size=1000g
> 
>      ...
> 
>      [listen8]
>      protocol=tcp
>      ioengine=net
>      port=12352
>      listen
>      rw=read
>      bs=4k
>      size=1000g
> 
>      [connect8]
>      protocol=tcp
>      hostname=localhost
>      ioengine=net
>      port=12352
>      rw=write
>      startdelay=1
>      size=1000g
> 
> Moral of the story: I should refrain from writing more clever stuff.
> At least it looks like it is not too clever to be undebuggable.
> 
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/qemu/queue.h | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)

Thanks, applied to my block tree for QEMU 2.3:
https://github.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-03-12 13:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-10 15:45 [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race Paolo Bonzini
2015-03-10 22:51 ` Christian Borntraeger
2015-03-12 13:34 ` Stefan Hajnoczi
2015-03-12 13:35 ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).