[Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27
@ 2023-05-23 10:14 Michael Tokarev
  2023-05-23 10:14 ` [Stable-8.0.1 35/59] s390x/tcg: Fix LDER instruction format Michael Tokarev
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-stable, Michael Tokarev

The following patches are queued for QEMU stable v8.0.1:

  https://gitlab.com/qemu-project/qemu/-/commits/staging-8.0

Patch freeze is 2023-05-27, and the release is planned for 2023-05-29:

  https://wiki.qemu.org/Planning/8.0

Please respond here or CC qemu-stable@nongnu.org on any additional patches
you think should (or shouldn't) be included in the release.

The changes which are staging for inclusion, with the original commit hash
from master branch, are given below the bottom line.

Thanks!

/mjt

--------------------------------------
01* 3f9c41c5df96 Paolo Bonzini:
   vnc: avoid underflow when accessing user-provided address
02* 72497cff896f Yang Zhong:
   target/i386: Change wrong XFRM value in SGX CPUID leaf
03* 542fd43d7932 Axel Heider:
   hw/timer/imx_epit: don't shadow variable
04* 25d758175dfb Axel Heider:
   hw/timer/imx_epit: fix limit check
05* 0f689cf5ada4 Igor Mammedov:
   acpi: pcihp: allow repeating hot-unplug requests
06* 8c1e8fb2e7fc Wang Liang:
   block/monitor: Fix crash when executing HMP commit
07* c1654c3e37c3 Alex Bennée:
   qemu-options: finesse the recommendations around -blockdev
08* ac64ebbecf80 Peter Maydell:
   docs/about/deprecated.rst: Add "since 7.1" tag to dtb-kaslr-seed 
   deprecation
09* ad5c6ddea327 Akihiko Odaki:
   target/arm: Initialize debug capabilities only once
10* d565f58b3842 Peter Maydell:
   hw/net/msf2-emac: Don't modify descriptor in-place in emac_store_desc()
11* 0fe43f0abf19 Cédric Le Goater:
   hw/arm/boot: Make write_bootloader() public as arm_write_bootloader()
12* 902bba549fc3 Cédric Le Goater:
   hw/arm/aspeed: Use arm_write_bootloader() to write the bootloader
13* 0acbdb4c4ab6 Peter Maydell:
   hw/arm/raspi: Use arm_write_bootloader() to write boot code
14* 2c5fa0778c3b Peter Maydell:
   hw/intc/allwinner-a10-pic: Don't use set_bit()/clear_bit()
15* 7f3a3d3dc433 Peter Maydell:
   target/arm: Define and use new load_cpu_field_low32()
16* 3e20d90824c2 Peter Maydell:
   hw/sd/allwinner-sdhost: Correctly byteswap descriptor fields
17* a4ae17e5ec51 Peter Maydell:
   hw/net/allwinner-sun8i-emac: Correctly byteswap descriptor fields
18* de79b52604e4 Stefan Hajnoczi:
   block/export: call blk_set_dev_ops(blk, NULL, NULL)
19* 1098cc3fcf95 Shivaprasad G Bhat:
   softfloat: Fix the incorrect computation in float32_exp2
20* ef709860ea12 Paolo Bonzini:
   meson: leave unnecessary modules out of the build
21* e2626874a326 Kevin Wolf:
   block: Fix use after free in blockdev_mark_auto_del()
22* da4afaff074e Kevin Wolf:
   block: Consistently call bdrv_activate() outside coroutine
23* b2ab5f545fa1 Kevin Wolf:
   block: bdrv/blk_co_unref() for calls in coroutine context
24* 0c7d204f50c3 Kevin Wolf:
   block: Don't call no_coroutine_fns in qmp_block_resize()
25* df3ac6da476e LIU Zhiwei:
   target/riscv: Fix itrigger when icount is used
26* eae04c4c131a Bin Meng:
   target/riscv: Restore the predicate() NULL check behavior
27* 9136f661c727 Jonathan Cameron:
   hw/pci-bridge: pci_expander_bridge fix type in pxb_cxl_dev_reset()
28* 8c313254e61e Richard Henderson:
   accel/tcg: Fix atomic_mmu_lookup for reads
29* fcc0b0418fff Peter Maydell:
   target/arm: Fix handling of SW and NSW bits for stage 2 walks
30* cd22a0f520f4 Peter Maydell:
   ui: Fix pixel colour channel order for PNG screenshots
31* 478dccbb99db Peter Maydell:
   target/arm: Correct AArch64.S2MinTxSZ 32-bit EL1 input size check
32* d66ba6dc1cce Cédric Le Goater:
   async: Suppress GCC13 false positive in aio_bh_poll()
33* 6a5d81b17201 Shivaprasad G Bhat:
   tcg: ppc64: Fix mask generation for vextractdm
34* e8ecdfeb30f0 Ilya Leoshkevich:
   target/s390x: Fix EXECUTE of relative branches
35 970641de0190 Ilya Leoshkevich:
   s390x/tcg: Fix LDER instruction format
36* 92e667f6fd58 Jason Andryuk:
   9pfs/xen: Fix segfault on shutdown
37* 988998503bc6 Richard Henderson:
   tcg/i386: Set P_REXW in tcg_out_addi_ptr
38 88693ab2a53f Claudio Imbrenda:
   s390x/pv: Fix spurious warning with asynchronous teardown
39 80bd81cadd12 Claudio Imbrenda:
   util/async-teardown: wire up query-command-line-options
40 c70bb9a771d4 Lizhi Yang:
   docs/about/emulation: fix typo
41 3217b84f3cd8 Alex Bennée:
   tests/docker: bump the xtensa base to debian:11-slim
42 a0f8d2701b20 Daniil Kovalev:
   linux-user: Fix mips fp64 executables loading
43 1e35d327890b Michael Tokarev:
   linux-user: fix getgroups/setgroups allocations
44 403d18ae3842 Eric Blake:
   migration: Handle block device inactivation failures better
45 5d39f44d7ac5 Eric Blake:
   migration: Minor control flow simplification
46 6dab4c93ecfa Eric Blake:
   migration: Attempt disk reactivation in more failure scenarios
47 a6771f2f5cbf Richard Henderson:
   target/arm: Fix vd == vm overlap in sve_ldff1_z
48 9bd634b2f5e2 Paolo Bonzini:
   scsi-generic: fix buffer overflow on block limits inquiry
49 2b55e479e6fc Paolo Bonzini:
   target/i386: fix operand size for VCOMI/VUCOMI instructions
50 056d649007bc Xinyu Li:
   target/i386: fix avx2 instructions vzeroall and vpermdq
51 5d410557dea4 Hawkins Jiawei:
   vhost: fix possible wrap in SVQ descriptor ring
52 5ed3dabe57dd Leonardo Bras:
   hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0
53 1fac00f70b32 Eugenio Pérez:
   virtio-net: not enable vq reset feature unconditionally
54 3e69908907f8 Mauro Matteo Cascella:
   virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request
55 6d740fb01b9f Stefan Hajnoczi:
   aio-posix: do not nest poll handlers
56 844a12a63e12 Stefan Hajnoczi:
   tested: add test for nested aio_poll() in poll handlers
57 58a2e3f5c37b Stefan Hajnoczi:
   block: compile out assert_bdrv_graph_readable() by default
58 80fc5d260002 Kevin Wolf:
   graph-lock: Disable locking for now
59 7c1f51bf38de Kevin Wolf:
   nbd/server: Fix drained_poll to wake coroutine in right AioContext

(commit(s) marked with * were in previous series and are not resent)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 35/59] s390x/tcg: Fix LDER instruction format
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
@ 2023-05-23 10:14 ` Michael Tokarev
  2023-05-23 10:14 ` [Stable-8.0.1 38/59] s390x/pv: Fix spurious warning with asynchronous teardown Michael Tokarev
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Ilya Leoshkevich, David Hildenbrand,
	Richard Henderson, Thomas Huth, Michael Tokarev

From: Ilya Leoshkevich <iii@linux.ibm.com>

It's RRE, not RXE.

Found by running valgrind's none/tests/s390x/bfp-2.

Fixes: 86b59624c4aa ("s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP")
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230511134726.469651-1-iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 970641de01908dd09b569965e78f13842e5854bc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index 597d968b0e..1f1ac742a9 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -606,7 +606,7 @@
     F(0xed04, LDEB,    RXE,   Z,   0, m2_32u, new, f1, ldeb, 0, IF_BFP)
     F(0xed05, LXDB,    RXE,   Z,   0, m2_64, new_x, x1, lxdb, 0, IF_BFP)
     F(0xed06, LXEB,    RXE,   Z,   0, m2_32u, new_x, x1, lxeb, 0, IF_BFP)
-    F(0xb324, LDER,    RXE,   Z,   0, e2, new, f1, lde, 0, IF_AFP1)
+    F(0xb324, LDER,    RRE,   Z,   0, e2, new, f1, lde, 0, IF_AFP1)
     F(0xed24, LDE,     RXE,   Z,   0, m2_32u, new, f1, lde, 0, IF_AFP1)
 /* LOAD ROUNDED */
     F(0xb344, LEDBR,   RRF_e, Z,   0, f2, new, e1, ledb, 0, IF_BFP)
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 38/59] s390x/pv: Fix spurious warning with asynchronous teardown
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
  2023-05-23 10:14 ` [Stable-8.0.1 35/59] s390x/tcg: Fix LDER instruction format Michael Tokarev
@ 2023-05-23 10:14 ` Michael Tokarev
  2023-05-23 10:14 ` [Stable-8.0.1 39/59] util/async-teardown: wire up query-command-line-options Michael Tokarev
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Claudio Imbrenda, Marc Hartmayer, Thomas Huth,
	Michael Tokarev

From: Claudio Imbrenda <imbrenda@linux.ibm.com>

Kernel commit 292a7d6fca33 ("KVM: s390: pv: fix asynchronous teardown
for small VMs") causes the KVM_PV_ASYNC_CLEANUP_PREPARE ioctl to fail
if the VM is not larger than 2GiB. QEMU would attempt it and fail,
print an error message, and then proceed with a normal teardown.

Avoid attempting to use asynchronous teardown altogether when the VM is
not larger than 2 GiB. This will avoid triggering the error message and
also avoid pointless overhead; normal teardown is fast enough for small
VMs.

Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Fixes: c3a073c610 ("s390x/pv: Add support for asynchronous teardown for reboot")
Link: https://lore.kernel.org/all/20230421085036.52511-2-imbrenda@linux.ibm.com/
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-Id: <20230510105531.30623-2-imbrenda@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[thuth: Fix inline function parameter in pv.h]
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 88693ab2a53f2f3d25cb39a7b5034ab391bc5a81)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/hw/s390x/pv.c b/hw/s390x/pv.c
index 49ea38236c..b63f3784c6 100644
--- a/hw/s390x/pv.c
+++ b/hw/s390x/pv.c
@@ -13,6 +13,7 @@
 
 #include <linux/kvm.h>
 
+#include "qemu/units.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "sysemu/kvm.h"
@@ -115,7 +116,7 @@ static void *s390_pv_do_unprot_async_fn(void *p)
      return NULL;
 }
 
-bool s390_pv_vm_try_disable_async(void)
+bool s390_pv_vm_try_disable_async(S390CcwMachineState *ms)
 {
     /*
      * t is only needed to create the thread; once qemu_thread_create
@@ -123,7 +124,12 @@ bool s390_pv_vm_try_disable_async(void)
      */
     QemuThread t;
 
-    if (!kvm_check_extension(kvm_state, KVM_CAP_S390_PROTECTED_ASYNC_DISABLE)) {
+    /*
+     * If the feature is not present or if the VM is not larger than 2 GiB,
+     * KVM_PV_ASYNC_CLEANUP_PREPARE fill fail; no point in attempting it.
+     */
+    if ((MACHINE(ms)->maxram_size <= 2 * GiB) ||
+        !kvm_check_extension(kvm_state, KVM_CAP_S390_PROTECTED_ASYNC_DISABLE)) {
         return false;
     }
     if (s390_pv_cmd(KVM_PV_ASYNC_CLEANUP_PREPARE, NULL) != 0) {
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 503f212a31..0daf445d60 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -330,7 +330,7 @@ static inline void s390_do_cpu_ipl(CPUState *cs, run_on_cpu_data arg)
 
 static void s390_machine_unprotect(S390CcwMachineState *ms)
 {
-    if (!s390_pv_vm_try_disable_async()) {
+    if (!s390_pv_vm_try_disable_async(ms)) {
         s390_pv_vm_disable();
     }
     ms->pv = false;
diff --git a/include/hw/s390x/pv.h b/include/hw/s390x/pv.h
index 966306a9db..7b935e2246 100644
--- a/include/hw/s390x/pv.h
+++ b/include/hw/s390x/pv.h
@@ -14,10 +14,10 @@
 
 #include "qapi/error.h"
 #include "sysemu/kvm.h"
+#include "hw/s390x/s390-virtio-ccw.h"
 
 #ifdef CONFIG_KVM
 #include "cpu.h"
-#include "hw/s390x/s390-virtio-ccw.h"
 
 static inline bool s390_is_pv(void)
 {
@@ -41,7 +41,7 @@ static inline bool s390_is_pv(void)
 int s390_pv_query_info(void);
 int s390_pv_vm_enable(void);
 void s390_pv_vm_disable(void);
-bool s390_pv_vm_try_disable_async(void);
+bool s390_pv_vm_try_disable_async(S390CcwMachineState *ms);
 int s390_pv_set_sec_parms(uint64_t origin, uint64_t length);
 int s390_pv_unpack(uint64_t addr, uint64_t size, uint64_t tweak);
 void s390_pv_prep_reset(void);
@@ -61,7 +61,7 @@ static inline bool s390_is_pv(void) { return false; }
 static inline int s390_pv_query_info(void) { return 0; }
 static inline int s390_pv_vm_enable(void) { return 0; }
 static inline void s390_pv_vm_disable(void) {}
-static inline bool s390_pv_vm_try_disable_async(void) { return false; }
+static inline bool s390_pv_vm_try_disable_async(S390CcwMachineState *ms) { return false; }
 static inline int s390_pv_set_sec_parms(uint64_t origin, uint64_t length) { return 0; }
 static inline int s390_pv_unpack(uint64_t addr, uint64_t size, uint64_t tweak) { return 0; }
 static inline void s390_pv_prep_reset(void) {}
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 39/59] util/async-teardown: wire up query-command-line-options
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
  2023-05-23 10:14 ` [Stable-8.0.1 35/59] s390x/tcg: Fix LDER instruction format Michael Tokarev
  2023-05-23 10:14 ` [Stable-8.0.1 38/59] s390x/pv: Fix spurious warning with asynchronous teardown Michael Tokarev
@ 2023-05-23 10:14 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 40/59] docs/about/emulation: fix typo Michael Tokarev
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Claudio Imbrenda, Boris Fiuczynski, Thomas Huth,
	Michael Tokarev

From: Claudio Imbrenda <imbrenda@linux.ibm.com>

Add new -run-with option with an async-teardown=on|off parameter. It is
visible in the output of query-command-line-options QMP command, so it
can be discovered and used by libvirt.

The option -async-teardown is now redundant, deprecate it.

Reported-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Fixes: c891c24b1a ("os-posix: asynchronous teardown for shutdown on Linux")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-Id: <20230505120051.36605-2-imbrenda@linux.ibm.com>
[thuth: Add curly braces to fix error with GCC 8.5, fix bug in deprecated.rst]
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 80bd81cadd127c1e2fc784612a52abe392670ba4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context tweak in docs/about/deprecated.rst)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 914938fd76..2823362791 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -111,6 +111,10 @@ Use ``-machine acpi=off`` instead.
 The HAXM project has been retired (see https://github.com/intel/haxm#status).
 Use "whpx" (on Windows) or "hvf" (on macOS) instead.
 
+``-async-teardown`` (since 8.1)
+'''''''''''''''''''''''''''''''
+
+Use ``-run-with async-teardown=on`` instead.
 
 QEMU Machine Protocol (QMP) commands
 ------------------------------------
diff --git a/os-posix.c b/os-posix.c
index 5adc69f560..90ea71725f 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -36,6 +36,8 @@
 #include "qemu/log.h"
 #include "sysemu/runstate.h"
 #include "qemu/cutils.h"
+#include "qemu/config-file.h"
+#include "qemu/option.h"
 
 #ifdef CONFIG_LINUX
 #include <sys/prctl.h>
@@ -152,9 +154,21 @@ int os_parse_cmd_args(int index, const char *optarg)
         daemonize = 1;
         break;
 #if defined(CONFIG_LINUX)
+    /* deprecated */
     case QEMU_OPTION_asyncteardown:
         init_async_teardown();
         break;
+    case QEMU_OPTION_run_with: {
+        QemuOpts *opts = qemu_opts_parse_noisily(qemu_find_opts("run-with"),
+                                                 optarg, false);
+        if (!opts) {
+            exit(1);
+        }
+        if (qemu_opt_get_bool(opts, "async-teardown", false)) {
+            init_async_teardown();
+        }
+        break;
+    }
 #endif
     default:
         return -1;
diff --git a/qemu-options.hx b/qemu-options.hx
index 4b8855a4f7..fdddfab6ff 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4786,20 +4786,32 @@ DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log, "", QEMU_ARCH_ALL)
 DEF("async-teardown", 0, QEMU_OPTION_asyncteardown,
     "-async-teardown enable asynchronous teardown\n",
     QEMU_ARCH_ALL)
-#endif
 SRST
 ``-async-teardown``
-    Enable asynchronous teardown. A new process called "cleanup/<QEMU_PID>"
-    will be created at startup sharing the address space with the main qemu
-    process, using clone. It will wait for the main qemu process to
-    terminate completely, and then exit.
-    This allows qemu to terminate very quickly even if the guest was
-    huge, leaving the teardown of the address space to the cleanup
-    process. Since the cleanup process shares the same cgroups as the
-    main qemu process, accounting is performed correctly. This only
-    works if the cleanup process is not forcefully killed with SIGKILL
-    before the main qemu process has terminated completely.
+    This option is deprecated and should no longer be used. The new option
+    ``-run-with async-teardown=on`` is a replacement.
 ERST
+DEF("run-with", HAS_ARG, QEMU_OPTION_run_with,
+    "-run-with async-teardown[=on|off]\n"
+    "                misc QEMU process lifecycle options\n"
+    "                async-teardown=on enables asynchronous teardown\n",
+    QEMU_ARCH_ALL)
+SRST
+``-run-with``
+    Set QEMU process lifecycle options.
+
+    ``async-teardown=on`` enables asynchronous teardown. A new process called
+    "cleanup/<QEMU_PID>" will be created at startup sharing the address
+    space with the main QEMU process, using clone. It will wait for the
+    main QEMU process to terminate completely, and then exit. This allows
+    QEMU to terminate very quickly even if the guest was huge, leaving the
+    teardown of the address space to the cleanup process. Since the cleanup
+    process shares the same cgroups as the main QEMU process, accounting is
+    performed correctly. This only works if the cleanup process is not
+    forcefully killed with SIGKILL before the main QEMU process has
+    terminated completely.
+ERST
+#endif
 
 DEF("msg", HAS_ARG, QEMU_OPTION_msg,
     "-msg [timestamp[=on|off]][,guest-name=[on|off]]\n"
diff --git a/util/async-teardown.c b/util/async-teardown.c
index 62cdeb0f20..3ab19c8740 100644
--- a/util/async-teardown.c
+++ b/util/async-teardown.c
@@ -12,6 +12,9 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/config-file.h"
+#include "qemu/option.h"
+#include "qemu/module.h"
 #include <dirent.h>
 #include <sys/prctl.h>
 #include <sched.h>
@@ -144,3 +147,21 @@ void init_async_teardown(void)
     clone(async_teardown_fn, new_stack_for_clone(), CLONE_VM, NULL);
     sigprocmask(SIG_SETMASK, &old_signals, NULL);
 }
+
+static QemuOptsList qemu_run_with_opts = {
+    .name = "run-with",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_run_with_opts.head),
+    .desc = {
+        {
+            .name = "async-teardown",
+            .type = QEMU_OPT_BOOL,
+        },
+        { /* end of list */ }
+    },
+};
+
+static void register_teardown(void)
+{
+    qemu_add_opts(&qemu_run_with_opts);
+}
+opts_init(register_teardown);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 40/59] docs/about/emulation: fix typo
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (2 preceding siblings ...)
  2023-05-23 10:14 ` [Stable-8.0.1 39/59] util/async-teardown: wire up query-command-line-options Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 41/59] tests/docker: bump the xtensa base to debian:11-slim Michael Tokarev
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Lizhi Yang, Philippe Mathieu-Daudé, Thomas Huth,
	Michael Tokarev

From: Lizhi Yang <sledgeh4w@gmail.com>

Duplicated word "are".

Signed-off-by: Lizhi Yang <sledgeh4w@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230511080119.99018-1-sledgeh4w@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit c70bb9a771d467302d1c7df5c5bd56b48f42716e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/docs/about/emulation.rst b/docs/about/emulation.rst
index b510a54418..0ad0b86f0d 100644
--- a/docs/about/emulation.rst
+++ b/docs/about/emulation.rst
@@ -99,7 +99,7 @@ depending on the guest architecture.
     - Yes
     - A configurable 32 bit soft core now owned by Cadence
 
-A number of features are are only available when running under
+A number of features are only available when running under
 emulation including :ref:`Record/Replay<replay>` and :ref:`TCG Plugins`.
 
 .. _Semihosting:
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 41/59] tests/docker: bump the xtensa base to debian:11-slim
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (3 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 40/59] docs/about/emulation: fix typo Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 42/59] linux-user: Fix mips fp64 executables loading Michael Tokarev
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Alex Bennée, Thomas Huth, Juan Quintela,
	Richard Henderson, Michael Tokarev

From: Alex Bennée <alex.bennee@linaro.org>

Stretch is going out of support so things like security updates will
fail. As the toolchain itself is binary it hopefully won't mind the
underlying OS being updated.

Message-Id: <20230503091244.1450613-3-alex.bennee@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reported-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 3217b84f3cd813a7daffc64b26543c313f3a042a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/tests/docker/dockerfiles/debian-xtensa-cross.docker b/tests/docker/dockerfiles/debian-xtensa-cross.docker
index 082b50da19..72c25d63d9 100644
--- a/tests/docker/dockerfiles/debian-xtensa-cross.docker
+++ b/tests/docker/dockerfiles/debian-xtensa-cross.docker
@@ -5,7 +5,7 @@
 # using a prebuilt toolchains for Xtensa cores from:
 # https://github.com/foss-xtensa/toolchain/releases
 #
-FROM docker.io/library/debian:stretch-slim
+FROM docker.io/library/debian:11-slim
 
 RUN apt-get update && \
     DEBIAN_FRONTEND=noninteractive apt install -yy eatmydata && \
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 42/59] linux-user: Fix mips fp64 executables loading
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (4 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 41/59] tests/docker: bump the xtensa base to debian:11-slim Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 43/59] linux-user: fix getgroups/setgroups allocations Michael Tokarev
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Daniil Kovalev, Jiaxun Yang, Laurent Vivier,
	Michael Tokarev

From: Daniil Kovalev <dkovalev@compiler-toolchain-for.me>

If a program requires fr1, we should set the FR bit of CP0 control status
register and add F64 hardware flag. The corresponding `else if` branch
statement is copied from the linux kernel sources (see `arch_check_elf` function
in linux/arch/mips/kernel/elf.c).

Signed-off-by: Daniil Kovalev <dkovalev@compiler-toolchain-for.me>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20230404052153.16617-1-dkovalev@compiler-toolchain-for.me>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit a0f8d2701b205d9d7986aa555e0566b13dc18fa0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/linux-user/mips/cpu_loop.c b/linux-user/mips/cpu_loop.c
index d5c1c7941d..8735e58bad 100644
--- a/linux-user/mips/cpu_loop.c
+++ b/linux-user/mips/cpu_loop.c
@@ -290,7 +290,10 @@ void target_cpu_copy_regs(CPUArchState *env, struct target_pt_regs *regs)
             env->CP0_Status |= (1 << CP0St_FR);
             env->hflags |= MIPS_HFLAG_F64;
         }
-    } else  if (!prog_req.fre && !prog_req.frdefault &&
+    } else if (prog_req.fr1) {
+        env->CP0_Status |= (1 << CP0St_FR);
+        env->hflags |= MIPS_HFLAG_F64;
+    } else if (!prog_req.fre && !prog_req.frdefault &&
           !prog_req.fr1 && !prog_req.single && !prog_req.soft) {
         fprintf(stderr, "qemu: Can't find a matching FPU mode\n");
         exit(1);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 43/59] linux-user: fix getgroups/setgroups allocations
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (5 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 42/59] linux-user: Fix mips fp64 executables loading Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 44/59] migration: Handle block device inactivation failures better Michael Tokarev
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-stable, Michael Tokarev, Laurent Vivier

linux-user getgroups(), setgroups(), getgroups32() and setgroups32()
used alloca() to allocate grouplist arrays, with unchecked gidsetsize
coming from the "guest".  With NGROUPS_MAX being 65536 (linux, and it
is common for an application to allocate NGROUPS_MAX for getgroups()),
this means a typical allocation is half the megabyte on the stack.
Which just overflows stack, which leads to immediate SIGSEGV in actual
system getgroups() implementation.

An example of such issue is aptitude, eg
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=811087#72

Cap gidsetsize to NGROUPS_MAX (return EINVAL if it is larger than that),
and use heap allocation for grouplist instead of alloca().  While at it,
fix coding style and make all 4 implementations identical.

Try to not impose random limits - for example, allow gidsetsize to be
negative for getgroups() - just do not allocate negative-sized grouplist
in this case but still do actual getgroups() call.  But do not allow
negative gidsetsize for setgroups() since its argument is unsigned.

Capping by NGROUPS_MAX seems a bit arbitrary, - we can do more, it is
not an error if set size will be NGROUPS_MAX+1. But we should not allow
integer overflow for the array being allocated. Maybe it is enough to
just call g_try_new() and return ENOMEM if it fails.

Maybe there's also no need to convert setgroups() since this one is
usually smaller and known beforehand (KERN_NGROUPS_MAX is actually 63, -
this is apparently a kernel-imposed limit for runtime group set).

The patch fixes aptitude segfault mentioned above.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20230409105327.1273372-1-mjt@msgid.tls.msk.ru>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 1e35d327890bdd117a67f79c52e637fb12bb1bf4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 69f740ff98..333e6b7026 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -11475,39 +11475,58 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1,
         {
             int gidsetsize = arg1;
             target_id *target_grouplist;
-            gid_t *grouplist;
+            g_autofree gid_t *grouplist = NULL;
             int i;
 
-            grouplist = alloca(gidsetsize * sizeof(gid_t));
+            if (gidsetsize > NGROUPS_MAX) {
+                return -TARGET_EINVAL;
+            }
+            if (gidsetsize > 0) {
+                grouplist = g_try_new(gid_t, gidsetsize);
+                if (!grouplist) {
+                    return -TARGET_ENOMEM;
+                }
+            }
             ret = get_errno(getgroups(gidsetsize, grouplist));
-            if (gidsetsize == 0)
-                return ret;
-            if (!is_error(ret)) {
-                target_grouplist = lock_user(VERIFY_WRITE, arg2, gidsetsize * sizeof(target_id), 0);
-                if (!target_grouplist)
+            if (!is_error(ret) && gidsetsize > 0) {
+                target_grouplist = lock_user(VERIFY_WRITE, arg2,
+                                             gidsetsize * sizeof(target_id), 0);
+                if (!target_grouplist) {
                     return -TARGET_EFAULT;
-                for(i = 0;i < ret; i++)
+                }
+                for (i = 0; i < ret; i++) {
                     target_grouplist[i] = tswapid(high2lowgid(grouplist[i]));
-                unlock_user(target_grouplist, arg2, gidsetsize * sizeof(target_id));
+                }
+                unlock_user(target_grouplist, arg2,
+                            gidsetsize * sizeof(target_id));
             }
+            return ret;
         }
-        return ret;
     case TARGET_NR_setgroups:
         {
             int gidsetsize = arg1;
             target_id *target_grouplist;
-            gid_t *grouplist = NULL;
+            g_autofree gid_t *grouplist = NULL;
             int i;
-            if (gidsetsize) {
-                grouplist = alloca(gidsetsize * sizeof(gid_t));
-                target_grouplist = lock_user(VERIFY_READ, arg2, gidsetsize * sizeof(target_id), 1);
+
+            if (gidsetsize > NGROUPS_MAX || gidsetsize < 0) {
+                return -TARGET_EINVAL;
+            }
+            if (gidsetsize > 0) {
+                grouplist = g_try_new(gid_t, gidsetsize);
+                if (!grouplist) {
+                    return -TARGET_ENOMEM;
+                }
+                target_grouplist = lock_user(VERIFY_READ, arg2,
+                                             gidsetsize * sizeof(target_id), 1);
                 if (!target_grouplist) {
                     return -TARGET_EFAULT;
                 }
                 for (i = 0; i < gidsetsize; i++) {
                     grouplist[i] = low2highgid(tswapid(target_grouplist[i]));
                 }
-                unlock_user(target_grouplist, arg2, 0);
+                unlock_user(target_grouplist, arg2,
+                            gidsetsize * sizeof(target_id));
             }
             return get_errno(setgroups(gidsetsize, grouplist));
         }
@@ -11792,41 +11811,59 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1,
         {
             int gidsetsize = arg1;
             uint32_t *target_grouplist;
-            gid_t *grouplist;
+            g_autofree gid_t *grouplist = NULL;
             int i;
 
-            grouplist = alloca(gidsetsize * sizeof(gid_t));
+            if (gidsetsize > NGROUPS_MAX) {
+                return -TARGET_EINVAL;
+            }
+            if (gidsetsize > 0) {
+                grouplist = g_try_new(gid_t, gidsetsize);
+                if (!grouplist) {
+                    return -TARGET_ENOMEM;
+                }
+            }
             ret = get_errno(getgroups(gidsetsize, grouplist));
-            if (gidsetsize == 0)
-                return ret;
-            if (!is_error(ret)) {
-                target_grouplist = lock_user(VERIFY_WRITE, arg2, gidsetsize * 4, 0);
+            if (!is_error(ret) && gidsetsize > 0) {
+                target_grouplist = lock_user(VERIFY_WRITE, arg2,
+                                             gidsetsize * 4, 0);
                 if (!target_grouplist) {
                     return -TARGET_EFAULT;
                 }
-                for(i = 0;i < ret; i++)
+                for (i = 0; i < ret; i++) {
                     target_grouplist[i] = tswap32(grouplist[i]);
+                }
                 unlock_user(target_grouplist, arg2, gidsetsize * 4);
             }
+            return ret;
         }
-        return ret;
 #endif
 #ifdef TARGET_NR_setgroups32
     case TARGET_NR_setgroups32:
         {
             int gidsetsize = arg1;
             uint32_t *target_grouplist;
-            gid_t *grouplist;
+            g_autofree gid_t *grouplist = NULL;
             int i;
 
-            grouplist = alloca(gidsetsize * sizeof(gid_t));
-            target_grouplist = lock_user(VERIFY_READ, arg2, gidsetsize * 4, 1);
-            if (!target_grouplist) {
-                return -TARGET_EFAULT;
+            if (gidsetsize > NGROUPS_MAX || gidsetsize < 0) {
+                return -TARGET_EINVAL;
+            }
+            if (gidsetsize > 0) {
+                grouplist = g_try_new(gid_t, gidsetsize);
+                if (!grouplist) {
+                    return -TARGET_ENOMEM;
+                }
+                target_grouplist = lock_user(VERIFY_READ, arg2,
+                                             gidsetsize * 4, 1);
+                if (!target_grouplist) {
+                    return -TARGET_EFAULT;
+                }
+                for (i = 0; i < gidsetsize; i++) {
+                    grouplist[i] = tswap32(target_grouplist[i]);
+                }
+                unlock_user(target_grouplist, arg2, 0);
             }
-            for(i = 0;i < gidsetsize; i++)
-                grouplist[i] = tswap32(target_grouplist[i]);
-            unlock_user(target_grouplist, arg2, 0);
             return get_errno(setgroups(gidsetsize, grouplist));
         }
 #endif
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 44/59] migration: Handle block device inactivation failures better
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (6 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 43/59] linux-user: fix getgroups/setgroups allocations Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 45/59] migration: Minor control flow simplification Michael Tokarev
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Eric Blake, Juan Quintela, Lukas Straub,
	Michael Tokarev

From: Eric Blake <eblake@redhat.com>

Consider what happens when performing a migration between two host
machines connected to an NFS server serving multiple block devices to
the guest, when the NFS server becomes unavailable.  The migration
attempts to inactivate all block devices on the source (a necessary
step before the destination can take over); but if the NFS server is
non-responsive, the attempt to inactivate can itself fail.  When that
happens, the destination fails to get the migrated guest (good,
because the source wasn't able to flush everything properly):

  (qemu) qemu-kvm: load of migration failed: Input/output error

at which point, our only hope for the guest is for the source to take
back control.  With the current code base, the host outputs a message, but then appears to resume:

  (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)

  (src qemu)info status
   VM status: running

but a second migration attempt now asserts:

  (src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.

Whether the guest is recoverable on the source after the first failure
is debatable, but what we do not want is to have qemu itself fail due
to an assertion.  It looks like the problem is as follows:

In migration.c:migration_completion(), the source sets 'inactivate' to
true (since COLO is not enabled), then tries
savevm.c:qemu_savevm_state_complete_precopy() with a request to
inactivate block devices.  In turn, this calls
block.c:bdrv_inactivate_all(), which fails when flushing runs up
against the non-responsive NFS server.  With savevm failing, we are
now left in a state where some, but not all, of the block devices have
been inactivated; but migration_completion() then jumps to 'fail'
rather than 'fail_invalidate' and skips an attempt to reclaim those
those disks by calling bdrv_activate_all().  Even if we do attempt to
reclaim disks, we aren't taking note of failure there, either.

Thus, we have reached a state where the migration engine has forgotten
all state about whether a block device is inactive, because we did not
set s->block_inactive in enough places; so migration allows the source
to reach vm_start() and resume execution, violating the block layer
invariant that the guest CPUs should not be restarted while a device
is inactive.  Note that the code in migration.c:migrate_fd_cancel()
will also try to reactivate all block devices if s->block_inactive was
set, but because we failed to set that flag after the first failure,
the source assumes it has reclaimed all devices, even though it still
has remaining inactivated devices and does not try again.  Normally,
qmp_cont() will also try to reactivate all disks (or correctly fail if
the disks are not reclaimable because NFS is not yet back up), but the
auto-resumption of the source after a migration failure does not go
through qmp_cont().  And because we have left the block layer in an
inconsistent state with devices still inactivated, the later migration
attempt is hitting the assertion failure.

Since it is important to not resume the source with inactive disks,
this patch marks s->block_inactive before attempting inactivation,
rather than after succeeding, in order to prevent any vm_start() until
it has successfully reactivated all devices.

See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit 403d18ae384239876764bbfa111d6cc5dcb673d1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/migration/migration.c b/migration/migration.c
index bda4789193..cb0d42c061 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3444,13 +3444,11 @@ static void migration_completion(MigrationState *s)
                                             MIGRATION_STATUS_DEVICE);
             }
             if (ret >= 0) {
+                s->block_inactive = inactivate;
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                 ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
                                                          inactivate);
             }
-            if (inactivate && ret >= 0) {
-                s->block_inactive = true;
-            }
         }
         qemu_mutex_unlock_iothread();

@@ -3522,6 +3520,7 @@ fail_invalidate:
         bdrv_activate_all(&local_err);
         if (local_err) {
             error_report_err(local_err);
+            s->block_inactive = true;
         } else {
             s->block_inactive = false;
         }
-- 
2.39.2

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 45/59] migration: Minor control flow simplification
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (7 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 44/59] migration: Handle block device inactivation failures better Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 46/59] migration: Attempt disk reactivation in more failure scenarios Michael Tokarev
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-stable, Eric Blake, Juan Quintela, Michael Tokarev

From: Eric Blake <eblake@redhat.com>

No need to declare a temporary variable.

Suggested-by: Juan Quintela <quintela@redhat.com>
Fixes: 1df36e8c6289 ("migration: Handle block device inactivation failures better")
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit 5d39f44d7ac5c63f53d4d0900ceba9521bc27e49)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/migration/migration.c b/migration/migration.c
index cb0d42c061..08007cef4e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3436,7 +3436,6 @@ static void migration_completion(MigrationState *s)
         ret = global_state_store();
 
         if (!ret) {
-            bool inactivate = !migrate_colo_enabled();
             ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
             trace_migration_completion_vm_stop(ret);
             if (ret >= 0) {
@@ -3444,10 +3443,10 @@ static void migration_completion(MigrationState *s)
                                             MIGRATION_STATUS_DEVICE);
             }
             if (ret >= 0) {
-                s->block_inactive = inactivate;
+                s->block_inactive = !migrate_colo_enabled();
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                 ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
-                                                         inactivate);
+                                                         s->block_inactive);
             }
         }
         qemu_mutex_unlock_iothread();
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 46/59] migration: Attempt disk reactivation in more failure scenarios
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (8 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 45/59] migration: Minor control flow simplification Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 47/59] target/arm: Fix vd == vm overlap in sve_ldff1_z Michael Tokarev
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Eric Blake, Kevin Wolf, Peter Xu, Juan Quintela,
	Michael Tokarev

From: Eric Blake <eblake@redhat.com>

Commit fe904ea824 added a fail_inactivate label, which tries to
reactivate disks on the source after a failure while s->state ==
MIGRATION_STATUS_ACTIVE, but didn't actually use the label if
qemu_savevm_state_complete_precopy() failed.  This failure to
reactivate is also present in commit 6039dd5b1c (also covering the new
s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring
s->block_inactive is set more reliably).

Consolidate the two labels back into one - no matter HOW migration is
failed, if there is any chance we can reach vm_start() after having
attempted inactivation, it is essential that we have tried to restart
disks before then.  This also makes the cleanup more like
migrate_fd_cancel().

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20230502205212.134680-1-eblake@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 6dab4c93ecfae48e2e67b984d1032c1e988d3005)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: minor context tweak near added comment in migration/migration.c)

diff --git a/migration/migration.c b/migration/migration.c
index 08007cef4e..99f86bd6c2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3443,6 +3443,11 @@ static void migration_completion(MigrationState *s)
                                             MIGRATION_STATUS_DEVICE);
             }
             if (ret >= 0) {
+                /*
+                 * Inactivate disks except in COLO, and track that we
+                 * have done so in order to remember to reactivate
+                 * them if migration fails or is cancelled.
+                 */
                 s->block_inactive = !migrate_colo_enabled();
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                 ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
@@ -3487,13 +3492,13 @@ static void migration_completion(MigrationState *s)
         rp_error = await_return_path_close_on_source(s);
         trace_migration_return_path_end_after(rp_error);
         if (rp_error) {
-            goto fail_invalidate;
+            goto fail;
         }
     }
 
     if (qemu_file_get_error(s->to_dst_file)) {
         trace_migration_completion_file_err();
-        goto fail_invalidate;
+        goto fail;
     }
 
     if (migrate_colo_enabled() && s->state == MIGRATION_STATUS_ACTIVE) {
@@ -3507,26 +3512,25 @@ static void migration_completion(MigrationState *s)
 
     return;
 
-fail_invalidate:
-    /* If not doing postcopy, vm_start() will be called: let's regain
-     * control on images.
-     */
-    if (s->state == MIGRATION_STATUS_ACTIVE ||
-        s->state == MIGRATION_STATUS_DEVICE) {
+fail:
+    if (s->block_inactive && (s->state == MIGRATION_STATUS_ACTIVE ||
+                              s->state == MIGRATION_STATUS_DEVICE)) {
+        /*
+         * If not doing postcopy, vm_start() will be called: let's
+         * regain control on images.
+         */
         Error *local_err = NULL;
 
         qemu_mutex_lock_iothread();
         bdrv_activate_all(&local_err);
         if (local_err) {
             error_report_err(local_err);
-            s->block_inactive = true;
         } else {
             s->block_inactive = false;
         }
         qemu_mutex_unlock_iothread();
     }
 
-fail:
     migrate_set_state(&s->state, current_active_state,
                       MIGRATION_STATUS_FAILED);
 }
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 47/59] target/arm: Fix vd == vm overlap in sve_ldff1_z
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (9 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 46/59] migration: Attempt disk reactivation in more failure scenarios Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 48/59] scsi-generic: fix buffer overflow on block limits inquiry Michael Tokarev
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-stable, Richard Henderson, Peter Maydell, Michael Tokarev

From: Richard Henderson <richard.henderson@linaro.org>

If vd == vm, copy vm to scratch, so that we can pre-zero
the output and still access the gather indicies.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1612
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230504104232.1877774-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit a6771f2f5cbfbf312e2fb5b1627f38a6bf6321d0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index ccf5e5beca..0097522470 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -6727,6 +6727,7 @@ void sve_ldff1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
     intptr_t reg_off;
     SVEHostPage info;
     target_ulong addr, in_page;
+    ARMVectorReg scratch;
 
     /* Skip to the first true predicate.  */
     reg_off = find_next_active(vg, 0, reg_max, esz);
@@ -6736,6 +6737,11 @@ void sve_ldff1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
         return;
     }
 
+    /* Protect against overlap between vd and vm. */
+    if (unlikely(vd == vm)) {
+        vm = memcpy(&scratch, vm, reg_max);
+    }
+
     /*
      * Probe the first element, allowing faults.
      */
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 48/59] scsi-generic: fix buffer overflow on block limits inquiry
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (10 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 47/59] target/arm: Fix vd == vm overlap in sve_ldff1_z Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 49/59] target/i386: fix operand size for VCOMI/VUCOMI instructions Michael Tokarev
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Paolo Bonzini, Théo Maillart, Michael Tokarev

From: Paolo Bonzini <pbonzini@redhat.com>

Using linux 6.x guest, at boot time, an inquiry on a scsi-generic
device makes qemu crash.  This is caused by a buffer overflow when
scsi-generic patches the block limits VPD page.

Do the operations on a temporary on-stack buffer that is guaranteed
to be large enough.

Reported-by: Théo Maillart <tmaillart@freebox.fr>
Analyzed-by: Théo Maillart <tmaillart@freebox.fr>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 9bd634b2f5e2f10fe35d7609eb83f30583f2e15a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index ac9fa662b4..2417f0ad84 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -191,12 +191,16 @@ static int scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s, int len)
     if ((s->type == TYPE_DISK || s->type == TYPE_ZBC) &&
         (r->req.cmd.buf[1] & 0x01)) {
         page = r->req.cmd.buf[2];
-        if (page == 0xb0) {
+        if (page == 0xb0 && r->buflen >= 8) {
+            uint8_t buf[16] = {};
+            uint8_t buf_used = MIN(r->buflen, 16);
             uint64_t max_transfer = calculate_max_transfer(s);
-            stl_be_p(&r->buf[8], max_transfer);
-            /* Also take care of the opt xfer len. */
-            stl_be_p(&r->buf[12],
-                    MIN_NON_ZERO(max_transfer, ldl_be_p(&r->buf[12])));
+
+            memcpy(buf, r->buf, buf_used);
+            stl_be_p(&buf[8], max_transfer);
+            stl_be_p(&buf[12], MIN_NON_ZERO(max_transfer, ldl_be_p(&buf[12])));
+            memcpy(r->buf + 8, buf + 8, buf_used - 8);
+
         } else if (s->needs_vpd_bl_emulation && page == 0x00 && r->buflen >= 4) {
             /*
              * Now we're capable of supplying the VPD Block Limits
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 49/59] target/i386: fix operand size for VCOMI/VUCOMI instructions
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (11 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 48/59] scsi-generic: fix buffer overflow on block limits inquiry Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 50/59] target/i386: fix avx2 instructions vzeroall and vpermdq Michael Tokarev
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-stable, Paolo Bonzini, Gabriele Svelto, Michael Tokarev

From: Paolo Bonzini <pbonzini@redhat.com>

Compared to other SSE instructions, VUCOMISx and VCOMISx are different:
the single and double precision versions are distinguished through a
prefix, however they use no-prefix and 0x66 for SS and SD respectively.
Scalar values usually are associated with 0xF2 and 0xF3.

Because of these, they incorrectly perform a 128-bit memory load instead
of a 32- or 64-bit load.  Fix this by writing a custom decoding function.

I tested that the reproducer is fixed and the test-avx output does not
change.

Reported-by: Gabriele Svelto <gsvelto@mozilla.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1637
Fixes: f8d19eec0d53 ("target/i386: reimplement 0x0f 0x28-0x2f, add AVX", 2022-10-18)
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 2b55e479e6fcbb466585fd25077a50c32e10dc3a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 4fdd87750b..48fefaffdf 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -783,6 +783,17 @@ static void decode_0F2D(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui
     *entry = *decode_by_prefix(s, opcodes_0F2D);
 }
 
+static void decode_VxCOMISx(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
+{
+    /*
+     * VUCOMISx and VCOMISx are different and use no-prefix and 0x66 for SS and SD
+     * respectively.  Scalar values usually are associated with 0xF2 and 0xF3, for
+     * which X86_VEX_REPScalar exists, but here it has to be decoded by hand.
+     */
+    entry->s1 = entry->s2 = (s->prefix & PREFIX_DATA ? X86_SIZE_sd : X86_SIZE_ss);
+    entry->gen = (*b == 0x2E ? gen_VUCOMI : gen_VCOMI);
+}
+
 static void decode_sse_unary(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
 {
     if (!(s->prefix & (PREFIX_REPZ | PREFIX_REPNZ))) {
@@ -871,8 +882,8 @@ static const X86OpEntry opcodes_0F[256] = {
     [0x2B] = X86_OP_GROUP0(0F2B),
     [0x2C] = X86_OP_GROUP0(0F2C),
     [0x2D] = X86_OP_GROUP0(0F2D),
-    [0x2E] = X86_OP_ENTRY3(VUCOMI,     None,None, V,x, W,x,  vex4 p_00_66),
-    [0x2F] = X86_OP_ENTRY3(VCOMI,      None,None, V,x, W,x,  vex4 p_00_66),
+    [0x2E] = X86_OP_GROUP3(VxCOMISx,   None,None, V,x, W,x,  vex3 p_00_66), /* VUCOMISS/SD */
+    [0x2F] = X86_OP_GROUP3(VxCOMISx,   None,None, V,x, W,x,  vex3 p_00_66), /* VCOMISS/SD */
 
     [0x38] = X86_OP_GROUP0(0F38),
     [0x3a] = X86_OP_GROUP0(0F3A),
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 50/59] target/i386: fix avx2 instructions vzeroall and vpermdq
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (12 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 49/59] target/i386: fix operand size for VCOMI/VUCOMI instructions Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 51/59] vhost: fix possible wrap in SVQ descriptor ring Michael Tokarev
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-stable, Xinyu Li, Paolo Bonzini, Michael Tokarev

From: Xinyu Li <lixinyu20s@ict.ac.cn>

vzeroall: xmm_regs should be used instead of xmm_t0
vpermdq: bit 3 and 7 of imm should be considered

Signed-off-by: Xinyu Li <lixinyu20s@ict.ac.cn>
Message-Id: <20230510145222.586487-1-lixinyu20s@ict.ac.cn>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 056d649007bc9fdae9f1d576e77c1316e9a34468)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 0bd6bfad8a..fb63af7afa 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2497,6 +2497,14 @@ void helper_vpermdq_ymm(Reg *d, Reg *v, Reg *s, uint32_t order)
     d->Q(1) = r1;
     d->Q(2) = r2;
     d->Q(3) = r3;
+    if (order & 0x8) {
+        d->Q(0) = 0;
+        d->Q(1) = 0;
+    }
+    if (order & 0x80) {
+        d->Q(2) = 0;
+        d->Q(3) = 0;
+    }
 }
 
 void helper_vpermq_ymm(Reg *d, Reg *s, uint32_t order)
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 95fb4f52fa..4fe8dec427 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -2285,7 +2285,7 @@ static void gen_VZEROALL(DisasContext *s, CPUX86State *env, X86DecodedInsn *deco
 {
     TCGv_ptr ptr = tcg_temp_new_ptr();
 
-    tcg_gen_addi_ptr(ptr, cpu_env, offsetof(CPUX86State, xmm_t0));
+    tcg_gen_addi_ptr(ptr, cpu_env, offsetof(CPUX86State, xmm_regs));
     gen_helper_memset(ptr, ptr, tcg_constant_i32(0),
                       tcg_constant_ptr(CPU_NB_REGS * sizeof(ZMMReg)));
 }
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 51/59] vhost: fix possible wrap in SVQ descriptor ring
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (13 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 50/59] target/i386: fix avx2 instructions vzeroall and vpermdq Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 52/59] hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0 Michael Tokarev
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Hawkins Jiawei, Eugenio Pérez,
	Michael S . Tsirkin, Lei Yang, Michael Tokarev

From: Hawkins Jiawei <yin31149@gmail.com>

QEMU invokes vhost_svq_add() when adding a guest's element
into SVQ. In vhost_svq_add(), it uses vhost_svq_available_slots()
to check whether QEMU can add the element into SVQ. If there is
enough space, then QEMU combines some out descriptors and some
in descriptors into one descriptor chain, and adds it into
`svq->vring.desc` by vhost_svq_vring_write_descs().

Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx`
in vhost_svq_available_slots() returns the number of occupied elements,
or the number of descriptor chains, instead of the number of occupied
descriptors, which may cause wrapping in SVQ descriptor ring.

Here is an example. In vhost_handle_guest_kick(), QEMU forwards
as many available buffers to device by virtqueue_pop() and
vhost_svq_add_element(). virtqueue_pop() returns a guest's element,
and then this element is added into SVQ by vhost_svq_add_element(),
a wrapper to vhost_svq_add(). If QEMU invokes virtqueue_pop() and
vhost_svq_add_element() `svq->vring.num` times,
vhost_svq_available_slots() thinks QEMU just ran out of slots and
everything should work fine. But in fact, virtqueue_pop() returns
`svq->vring.num` elements or descriptor chains, more than
`svq->vring.num` descriptors due to guest memory fragmentation,
and this causes wrapping in SVQ descriptor ring.

This bug is valid even before marking the descriptors used.
If the guest memory is fragmented, SVQ must add chains
so it can try to add more descriptors than possible.

This patch solves it by adding `num_free` field in
VhostShadowVirtqueue structure and updating this field
in vhost_svq_add() and vhost_svq_get_buf(), to record
the number of free descriptors.

Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230509084817.3973-1-yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
(cherry picked from commit 5d410557dea452f6231a7c66155e29a37e168528)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 8361e70d1b..bd7c12b6d3 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp)
  */
 static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
 {
-    return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
+    return svq->num_free;
 }

 /**
@@ -263,6 +263,7 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
         return -EINVAL;
     }

+    svq->num_free -= ndescs;
     svq->desc_state[qemu_head].elem = elem;
     svq->desc_state[qemu_head].ndescs = ndescs;
     vhost_svq_kick(svq);
@@ -449,6 +450,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
     last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
     svq->desc_next[last_used_chain] = svq->free_head;
     svq->free_head = used_elem.id;
+    svq->num_free += num;

     *len = used_elem.len;
     return g_steal_pointer(&svq->desc_state[used_elem.id].elem);
@@ -659,6 +661,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
     svq->iova_tree = iova_tree;

     svq->vring.num = virtio_queue_get_num(vdev, virtio_get_queue_index(vq));
+    svq->num_free = svq->vring.num;
     driver_size = vhost_svq_driver_area_size(svq);
     device_size = vhost_svq_device_area_size(svq);
     svq->vring.desc = qemu_memalign(qemu_real_host_page_size(), driver_size);
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 926a4897b1..6efe051a70 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -107,6 +107,9 @@ typedef struct VhostShadowVirtqueue {

     /* Next head to consume from the device */
     uint16_t last_used_idx;
+
+    /* Size of SVQ vring free descriptors */
+    uint16_t num_free;
 } VhostShadowVirtqueue;

 bool vhost_svq_valid_features(uint64_t features, Error **errp);
-- 
2.39.2

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 52/59] hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (14 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 51/59] vhost: fix possible wrap in SVQ descriptor ring Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 53/59] virtio-net: not enable vq reset feature unconditionally Michael Tokarev
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Leonardo Bras, Michael S . Tsirkin, Jonathan Cameron,
	Peter Xu, Juan Quintela, Fiona Ebner, Michael Tokarev

From: Leonardo Bras <leobras@redhat.com>

Since it's implementation on v8.0.0-rc0, having the PCI_ERR_UNCOR_MASK
set for machine types < 8.0 will cause migration to fail if the target
QEMU version is < 8.0.0 :

qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10a read: 40 device: 0 cmask: ff wmask: 0 w1cmask:0
qemu-system-x86_64: Failed to load PCIDevice:config
qemu-system-x86_64: Failed to load e1000e:parent_obj
qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:02.0/e1000e'
qemu-system-x86_64: load of migration failed: Invalid argument

The above test migrated a 7.2 machine type from QEMU master to QEMU 7.2.0,
with this cmdline:

./qemu-system-x86_64 -M pc-q35-7.2 [-incoming XXX]

In order to fix this, property x-pcie-err-unc-mask was introduced to
control when PCI_ERR_UNCOR_MASK is enabled. This property is enabled by
default, but is disabled if machine type <= 7.2.

Fixes: 010746ae1d ("hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20230503002701.854329-1-leobras@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1576
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 5ed3dabe57dd9f4c007404345e5f5bf0e347317f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/hw/core/machine.c b/hw/core/machine.c
index cd13b8b0a3..5060119952 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -43,6 +43,7 @@ GlobalProperty hw_compat_7_2[] = {
     { "e1000e", "migrate-timadj", "off" },
     { "virtio-mem", "x-early-migration", "false" },
     { "migration", "x-preempt-pre-7-2", "true" },
+    { TYPE_PCI_DEVICE, "x-pcie-err-unc-mask", "off" },
 };
 const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index def5000e7b..8ad4349e96 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -79,6 +79,8 @@ static Property pci_props[] = {
     DEFINE_PROP_STRING("failover_pair_id", PCIDevice,
                        failover_pair_id),
     DEFINE_PROP_UINT32("acpi-index",  PCIDevice, acpi_index, 0),
+    DEFINE_PROP_BIT("x-pcie-err-unc-mask", PCIDevice, cap_present,
+                    QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
     DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index 103667c368..374d593ead 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -112,10 +112,13 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
 
     pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
                  PCI_ERR_UNC_SUPPORTED);
-    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
-                 PCI_ERR_UNC_MASK_DEFAULT);
-    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
-                 PCI_ERR_UNC_SUPPORTED);
+
+    if (dev->cap_present & QEMU_PCIE_ERR_UNC_MASK) {
+        pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
+                     PCI_ERR_UNC_MASK_DEFAULT);
+        pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
+                     PCI_ERR_UNC_SUPPORTED);
+    }
 
     pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
                  PCI_ERR_UNC_SEVERITY_DEFAULT);
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d5a40cd058..6dc6742fc4 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -207,6 +207,8 @@ enum {
     QEMU_PCIE_EXTCAP_INIT = (1 << QEMU_PCIE_EXTCAP_INIT_BITNR),
 #define QEMU_PCIE_CXL_BITNR 10
     QEMU_PCIE_CAP_CXL = (1 << QEMU_PCIE_CXL_BITNR),
+#define QEMU_PCIE_ERR_UNC_MASK_BITNR 11
+    QEMU_PCIE_ERR_UNC_MASK = (1 << QEMU_PCIE_ERR_UNC_MASK_BITNR),
 };
 
 typedef struct PCIINTxRoute {
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 53/59] virtio-net: not enable vq reset feature unconditionally
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (15 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 52/59] hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0 Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 54/59] virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request Michael Tokarev
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Eugenio Pérez, Xuan Zhuo, Michael S . Tsirkin,
	Michael Tokarev

From: Eugenio Pérez <eperezma@redhat.com>

The commit 93a97dc5200a ("virtio-net: enable vq reset feature") enables
unconditionally vq reset feature as long as the device is emulated.
This makes impossible to actually disable the feature, and it causes
migration problems from qemu version previous than 7.2.

The entire final commit is unneeded as device system already enable or
disable the feature properly.

This reverts commit 93a97dc5200a95e63b99cb625f20b7ae802ba413.
Fixes: 93a97dc5200a ("virtio-net: enable vq reset feature")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Message-Id: <20230504101447.389398-1-eperezma@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 1fac00f70b3261050af5564b20ca55c1b2a3059a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 53e1c32643..4ea33b6e2e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -805,7 +805,6 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
     }

     if (!get_vhost_net(nc->peer)) {
-        virtio_add_feature(&features, VIRTIO_F_RING_RESET);
         return features;
     }

-- 
2.39.2

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 54/59] virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (16 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 53/59] virtio-net: not enable vq reset feature unconditionally Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 55/59] aio-posix: do not nest poll handlers Michael Tokarev
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Mauro Matteo Cascella, Yiming Tao, Gonglei,
	zhenwei pi, Michael S . Tsirkin, Michael Tokarev

From: Mauro Matteo Cascella <mcascell@redhat.com>

Ensure op_info is not NULL in case of QCRYPTODEV_BACKEND_ALG_SYM algtype.

Fixes: 0e660a6f90a ("crypto: Introduce RSA algorithm")
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reported-by: Yiming Tao <taoym@zju.edu.cn>
Message-Id: <20230509075317.1132301-1-mcascell@redhat.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: zhenwei pi<pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 3e69908907f8d3dd20d5753b0777a6e3824ba824)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
index 802e1b9659..a1d122b9aa 100644
--- a/hw/virtio/virtio-crypto.c
+++ b/hw/virtio/virtio-crypto.c
@@ -476,15 +476,17 @@ static void virtio_crypto_free_request(VirtIOCryptoReq *req)
         size_t max_len;
         CryptoDevBackendSymOpInfo *op_info = req->op_info.u.sym_op_info;
 
-        max_len = op_info->iv_len +
-                  op_info->aad_len +
-                  op_info->src_len +
-                  op_info->dst_len +
-                  op_info->digest_result_len;
-
-        /* Zeroize and free request data structure */
-        memset(op_info, 0, sizeof(*op_info) + max_len);
-        g_free(op_info);
+        if (op_info) {
+            max_len = op_info->iv_len +
+                      op_info->aad_len +
+                      op_info->src_len +
+                      op_info->dst_len +
+                      op_info->digest_result_len;
+
+            /* Zeroize and free request data structure */
+            memset(op_info, 0, sizeof(*op_info) + max_len);
+            g_free(op_info);
+        }
     } else if (req->flags == QCRYPTODEV_BACKEND_ALG_ASYM) {
         CryptoDevBackendAsymOpInfo *op_info = req->op_info.u.asym_op_info;
         if (op_info) {
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 55/59] aio-posix: do not nest poll handlers
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (17 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 54/59] virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 56/59] tested: add test for nested aio_poll() in " Michael Tokarev
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-stable, Stefan Hajnoczi, Kevin Wolf,
	Emanuele Giuseppe Esposito, Paolo Bonzini, Michael Tokarev

From: Stefan Hajnoczi <stefanha@redhat.com>

QEMU's event loop supports nesting, which means that event handler
functions may themselves call aio_poll(). The condition that triggered a
handler must be reset before the nested aio_poll() call, otherwise the
same handler will be called and immediately re-enter aio_poll. This
leads to an infinite loop and stack exhaustion.

Poll handlers are especially prone to this issue, because they typically
reset their condition by finishing the processing of pending work.
Unfortunately it is during the processing of pending work that nested
aio_poll() calls typically occur and the condition has not yet been
reset.

Disable a poll handler during ->io_poll_ready() so that a nested
aio_poll() call cannot invoke ->io_poll_ready() again. As a result, the
disabled poll handler and its associated fd handler do not run during
the nested aio_poll(). Calling aio_set_fd_handler() from inside nested
aio_poll() could cause it to run again. If the fd handler is pending
inside nested aio_poll(), then it will also run again.

In theory fd handlers can be affected by the same issue, but they are
more likely to reset the condition before calling nested aio_poll().

This is a special case and it's somewhat complex, but I don't see a way
around it as long as nested aio_poll() is supported.

Cc: qemu-stable@nongnu.org
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2186181
Fixes: c38270692593 ("block: Mark bdrv_co_io_(un)plug() and callers GRAPH_RDLOCK")
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230502184134.534703-2-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 6d740fb01b9f0f5ea7a82f4d5e458d91940a19ee)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/util/aio-posix.c b/util/aio-posix.c
index a8be940f76..34bc2a64d8 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -353,8 +353,19 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
         poll_ready && revents == 0 &&
         aio_node_check(ctx, node->is_external) &&
         node->io_poll_ready) {
+        /*
+         * Remove temporarily to avoid infinite loops when ->io_poll_ready()
+         * calls aio_poll() before clearing the condition that made the poll
+         * handler become ready.
+         */
+        QLIST_SAFE_REMOVE(node, node_poll);
+
         node->io_poll_ready(node->opaque);

+        if (!QLIST_IS_INSERTED(node, node_poll)) {
+            QLIST_INSERT_HEAD(&ctx->poll_aio_handlers, node, node_poll);
+        }
+
         /*
          * Return early since revents was zero. aio_notify() does not count as
          * progress.
-- 
2.39.2

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 56/59] tested: add test for nested aio_poll() in poll handlers
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (18 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 55/59] aio-posix: do not nest poll handlers Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 57/59] block: compile out assert_bdrv_graph_readable() by default Michael Tokarev
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-stable, Stefan Hajnoczi, Kevin Wolf, Michael Tokarev

From: Stefan Hajnoczi <stefanha@redhat.com>

Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230502184134.534703-3-stefanha@redhat.com>
[kwolf: Restrict to CONFIG_POSIX, Windows doesn't support polling]
Tested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 844a12a63e12b1235a8fc17f9b278929dc6eb00e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 3bc78d8660..8ed81786ee 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -114,7 +114,10 @@ if have_block
     tests += {'test-crypto-xts': [crypto, io]}
   endif
   if 'CONFIG_POSIX' in config_host
-    tests += {'test-image-locking': [testblock]}
+    tests += {
+      'test-image-locking': [testblock],
+      'test-nested-aio-poll': [testblock],
+    }
   endif
   if config_host_data.get('CONFIG_REPLICATION')
     tests += {'test-replication': [testblock]}
diff --git a/tests/unit/test-nested-aio-poll.c b/tests/unit/test-nested-aio-poll.c
new file mode 100644
index 0000000000..9bbe18b839
--- /dev/null
+++ b/tests/unit/test-nested-aio-poll.c
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Test that poll handlers are not re-entrant in nested aio_poll()
+ *
+ * Copyright Red Hat
+ *
+ * Poll handlers are usually level-triggered. That means they continue firing
+ * until the condition is reset (e.g. a virtqueue becomes empty). If a poll
+ * handler calls nested aio_poll() before the condition is reset, then infinite
+ * recursion occurs.
+ *
+ * aio_poll() is supposed to prevent this by disabling poll handlers in nested
+ * aio_poll() calls. This test case checks that this is indeed what happens.
+ */
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "qapi/error.h"
+
+typedef struct {
+    AioContext *ctx;
+
+    /* This is the EventNotifier that drives the test */
+    EventNotifier poll_notifier;
+
+    /* This EventNotifier is only used to wake aio_poll() */
+    EventNotifier dummy_notifier;
+
+    bool nested;
+} TestData;
+
+static void io_read(EventNotifier *notifier)
+{
+    fprintf(stderr, "%s %p\n", __func__, notifier);
+    event_notifier_test_and_clear(notifier);
+}
+
+static bool io_poll_true(void *opaque)
+{
+    fprintf(stderr, "%s %p\n", __func__, opaque);
+    return true;
+}
+
+static bool io_poll_false(void *opaque)
+{
+    fprintf(stderr, "%s %p\n", __func__, opaque);
+    return false;
+}
+
+static void io_poll_ready(EventNotifier *notifier)
+{
+    TestData *td = container_of(notifier, TestData, poll_notifier);
+
+    fprintf(stderr, "> %s\n", __func__);
+
+    g_assert(!td->nested);
+    td->nested = true;
+
+    /* Wake the following nested aio_poll() call */
+    event_notifier_set(&td->dummy_notifier);
+
+    /* This nested event loop must not call io_poll()/io_poll_ready() */
+    g_assert(aio_poll(td->ctx, true));
+
+    td->nested = false;
+
+    fprintf(stderr, "< %s\n", __func__);
+}
+
+/* dummy_notifier never triggers */
+static void io_poll_never_ready(EventNotifier *notifier)
+{
+    g_assert_not_reached();
+}
+
+static void test(void)
+{
+    TestData td = {
+        .ctx = aio_context_new(&error_abort),
+    };
+
+    qemu_set_current_aio_context(td.ctx);
+
+    /* Enable polling */
+    aio_context_set_poll_params(td.ctx, 1000000, 2, 2, &error_abort);
+
+    /*
+     * The GSource is unused but this has the side-effect of changing the fdmon
+     * that AioContext uses.
+     */
+    aio_get_g_source(td.ctx);
+
+    /* Make the event notifier active (set) right away */
+    event_notifier_init(&td.poll_notifier, 1);
+    aio_set_event_notifier(td.ctx, &td.poll_notifier, false,
+                           io_read, io_poll_true, io_poll_ready);
+
+    /* This event notifier will be used later */
+    event_notifier_init(&td.dummy_notifier, 0);
+    aio_set_event_notifier(td.ctx, &td.dummy_notifier, false,
+                           io_read, io_poll_false, io_poll_never_ready);
+
+    /* Consume aio_notify() */
+    g_assert(!aio_poll(td.ctx, false));
+
+    /*
+     * Run the io_read() handler. This has the side-effect of activating
+     * polling in future aio_poll() calls.
+     */
+    g_assert(aio_poll(td.ctx, true));
+
+    /* The second time around the io_poll()/io_poll_ready() handler runs */
+    g_assert(aio_poll(td.ctx, true));
+
+    /* Run io_poll()/io_poll_ready() one more time to show it keeps working */
+    g_assert(aio_poll(td.ctx, true));
+
+    aio_set_event_notifier(td.ctx, &td.dummy_notifier, false,
+                           NULL, NULL, NULL);
+    aio_set_event_notifier(td.ctx, &td.poll_notifier, false, NULL, NULL, NULL);
+    event_notifier_cleanup(&td.dummy_notifier);
+    event_notifier_cleanup(&td.poll_notifier);
+    aio_context_unref(td.ctx);
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+    g_test_add_func("/nested-aio-poll", test);
+    return g_test_run();
+}
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 57/59] block: compile out assert_bdrv_graph_readable() by default
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (19 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 56/59] tested: add test for nested aio_poll() in " Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 58/59] graph-lock: Disable locking for now Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 59/59] nbd/server: Fix drained_poll to wake coroutine in right AioContext Michael Tokarev
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-stable, Stefan Hajnoczi, Kevin Wolf, Michael Tokarev

From: Stefan Hajnoczi <stefanha@redhat.com>

reader_count() is a performance bottleneck because the global
aio_context_list_lock mutex causes thread contention. Put this debugging
assertion behind a new ./configure --enable-debug-graph-lock option and
disable it by default.

The --enable-debug-graph-lock option is also enabled by the more general
--enable-debug option.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230501173443.153062-1-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 58a2e3f5c37be02dac3086b81bdda9414b931edf)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: pick this one up so the next patch which disables this applies cleanly)

diff --git a/block/graph-lock.c b/block/graph-lock.c
index 454c31e691..259a7a0bde 100644
--- a/block/graph-lock.c
+++ b/block/graph-lock.c
@@ -265,7 +265,10 @@ void bdrv_graph_rdunlock_main_loop(void)
 
 void assert_bdrv_graph_readable(void)
 {
+    /* reader_count() is slow due to aio_context_list_lock lock contention */
+#ifdef CONFIG_DEBUG_GRAPH_LOCK
     assert(qemu_in_main_thread() || reader_count());
+#endif
 }
 
 void assert_bdrv_graph_writable(void)
diff --git a/configure b/configure
index 800b5850f4..a62a3e6be9 100755
--- a/configure
+++ b/configure
@@ -806,6 +806,7 @@ for opt do
   --enable-debug)
       # Enable debugging options that aren't excessively noisy
       debug_tcg="yes"
+      meson_option_parse --enable-debug-graph-lock ""
       meson_option_parse --enable-debug-mutex ""
       meson_option_add -Doptimization=0
       fortify_source="no"
diff --git a/meson.build b/meson.build
index c7e486e087..30447cfaef 100644
--- a/meson.build
+++ b/meson.build
@@ -1956,6 +1956,7 @@ if get_option('debug_stack_usage') and have_coroutine_pool
   have_coroutine_pool = false
 endif
 config_host_data.set10('CONFIG_COROUTINE_POOL', have_coroutine_pool)
+config_host_data.set('CONFIG_DEBUG_GRAPH_LOCK', get_option('debug_graph_lock'))
 config_host_data.set('CONFIG_DEBUG_MUTEX', get_option('debug_mutex'))
 config_host_data.set('CONFIG_DEBUG_STACK_USAGE', get_option('debug_stack_usage'))
 config_host_data.set('CONFIG_GPROF', get_option('gprof'))
@@ -3837,6 +3838,7 @@ summary_info += {'PIE':               get_option('b_pie')}
 summary_info += {'static build':      config_host.has_key('CONFIG_STATIC')}
 summary_info += {'malloc trim support': has_malloc_trim}
 summary_info += {'membarrier':        have_membarrier}
+summary_info += {'debug graph lock':  get_option('debug_graph_lock')}
 summary_info += {'debug stack usage': get_option('debug_stack_usage')}
 summary_info += {'mutex debugging':   get_option('debug_mutex')}
 summary_info += {'memory allocator':  get_option('malloc')}
diff --git a/meson_options.txt b/meson_options.txt
index fc9447d267..bc857fe68b 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -311,6 +311,8 @@ option('rng_none', type: 'boolean', value: false,
        description: 'dummy RNG, avoid using /dev/(u)random and getrandom()')
 option('coroutine_pool', type: 'boolean', value: true,
        description: 'coroutine freelist (better performance)')
+option('debug_graph_lock', type: 'boolean', value: false,
+       description: 'graph lock debugging support')
 option('debug_mutex', type: 'boolean', value: false,
        description: 'mutex debugging support')
 option('debug_stack_usage', type: 'boolean', value: false,
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 009fab1515..30e1f25259 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -21,6 +21,8 @@ meson_options_help() {
   printf "%s\n" '                           QEMU'
   printf "%s\n" '  --enable-cfi             Control-Flow Integrity (CFI)'
   printf "%s\n" '  --enable-cfi-debug       Verbose errors in case of CFI violation'
+  printf "%s\n" '  --enable-debug-graph-lock'
+  printf "%s\n" '                           graph lock debugging support'
   printf "%s\n" '  --enable-debug-mutex     mutex debugging support'
   printf "%s\n" '  --enable-debug-stack-usage'
   printf "%s\n" '                           measure coroutine stack usage'
@@ -249,6 +251,8 @@ _meson_option_parse() {
     --datadir=*) quote_sh "-Ddatadir=$2" ;;
     --enable-dbus-display) printf "%s" -Ddbus_display=enabled ;;
     --disable-dbus-display) printf "%s" -Ddbus_display=disabled ;;
+    --enable-debug-graph-lock) printf "%s" -Ddebug_graph_lock=true ;;
+    --disable-debug-graph-lock) printf "%s" -Ddebug_graph_lock=false ;;
     --enable-debug-mutex) printf "%s" -Ddebug_mutex=true ;;
     --disable-debug-mutex) printf "%s" -Ddebug_mutex=false ;;
     --enable-debug-stack-usage) printf "%s" -Ddebug_stack_usage=true ;;
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 58/59] graph-lock: Disable locking for now
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (20 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 57/59] block: compile out assert_bdrv_graph_readable() by default Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  2023-05-23 10:15 ` [Stable-8.0.1 59/59] nbd/server: Fix drained_poll to wake coroutine in right AioContext Michael Tokarev
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-stable, Kevin Wolf, Eric Blake, Michael Tokarev

From: Kevin Wolf <kwolf@redhat.com>

In QEMU 8.0, we've been seeing deadlocks in bdrv_graph_wrlock(). They
come from callers that hold an AioContext lock, which is not allowed
during polling. In theory, we could temporarily release the lock, but
callers are inconsistent about whether they hold a lock, and if they do,
some are also confused about which one they hold. While all of this is
fixable, it's not trivial, and the best course of action for 8.0.1 is
probably just disabling the graph locking code temporarily.

We don't currently rely on graph locking yet. It is supposed to replace
the AioContext lock eventually to enable multiqueue support, but as long
as we still have the AioContext lock, it is sufficient without the graph
lock. Once the AioContext lock goes away, the deadlock doesn't exist any
more either and this commit can be reverted. (Of course, it can also be
reverted while the AioContext lock still exists if the callers have been
fixed.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230517152834.277483-2-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 80fc5d260002432628710f8b0c7cfc7d9b97bb9d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/block/graph-lock.c b/block/graph-lock.c
index 259a7a0bde..2490926c90 100644
--- a/block/graph-lock.c
+++ b/block/graph-lock.c
@@ -30,8 +30,10 @@ BdrvGraphLock graph_lock;
 /* Protects the list of aiocontext and orphaned_reader_count */
 static QemuMutex aio_context_list_lock;
 
+#if 0
 /* Written and read with atomic operations. */
 static int has_writer;
+#endif
 
 /*
  * A reader coroutine could move from an AioContext to another.
@@ -88,6 +90,7 @@ void unregister_aiocontext(AioContext *ctx)
     g_free(ctx->bdrv_graph);
 }
 
+#if 0
 static uint32_t reader_count(void)
 {
     BdrvGraphRWlock *brdv_graph;
@@ -105,10 +108,17 @@ static uint32_t reader_count(void)
     assert((int32_t)rd >= 0);
     return rd;
 }
+#endif
 
 void bdrv_graph_wrlock(void)
 {
     GLOBAL_STATE_CODE();
+    /*
+     * TODO Some callers hold an AioContext lock when this is called, which
+     * causes deadlocks. Reenable once the AioContext locking is cleaned up (or
+     * AioContext locks are gone).
+     */
+#if 0
     assert(!qatomic_read(&has_writer));
 
     /* Make sure that constantly arriving new I/O doesn't cause starvation */
@@ -139,11 +149,13 @@ void bdrv_graph_wrlock(void)
     } while (reader_count() >= 1);
 
     bdrv_drain_all_end();
+#endif
 }
 
 void bdrv_graph_wrunlock(void)
 {
     GLOBAL_STATE_CODE();
+#if 0
     QEMU_LOCK_GUARD(&aio_context_list_lock);
     assert(qatomic_read(&has_writer));
 
@@ -155,10 +167,13 @@ void bdrv_graph_wrunlock(void)
 
     /* Wake up all coroutine that are waiting to read the graph */
     qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
+#endif
 }
 
 void coroutine_fn bdrv_graph_co_rdlock(void)
 {
+    /* TODO Reenable when wrlock is reenabled */
+#if 0
     BdrvGraphRWlock *bdrv_graph;
     bdrv_graph = qemu_get_current_aio_context()->bdrv_graph;
 
@@ -223,10 +238,12 @@ void coroutine_fn bdrv_graph_co_rdlock(void)
             qemu_co_queue_wait(&reader_queue, &aio_context_list_lock);
         }
     }
+#endif
 }
 
 void coroutine_fn bdrv_graph_co_rdunlock(void)
 {
+#if 0
     BdrvGraphRWlock *bdrv_graph;
     bdrv_graph = qemu_get_current_aio_context()->bdrv_graph;
 
@@ -249,6 +266,7 @@ void coroutine_fn bdrv_graph_co_rdunlock(void)
     if (qatomic_read(&has_writer)) {
         aio_wait_kick();
     }
+#endif
 }
 
 void bdrv_graph_rdlock_main_loop(void)
@@ -266,13 +284,19 @@ void bdrv_graph_rdunlock_main_loop(void)
 void assert_bdrv_graph_readable(void)
 {
     /* reader_count() is slow due to aio_context_list_lock lock contention */
+    /* TODO Reenable when wrlock is reenabled */
+#if 0
 #ifdef CONFIG_DEBUG_GRAPH_LOCK
     assert(qemu_in_main_thread() || reader_count());
 #endif
+#endif
 }
 
 void assert_bdrv_graph_writable(void)
 {
     assert(qemu_in_main_thread());
+    /* TODO Reenable when wrlock is reenabled */
+#if 0
     assert(qatomic_read(&has_writer));
+#endif
 }
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Stable-8.0.1 59/59] nbd/server: Fix drained_poll to wake coroutine in right AioContext
  2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
                   ` (21 preceding siblings ...)
  2023-05-23 10:15 ` [Stable-8.0.1 58/59] graph-lock: Disable locking for now Michael Tokarev
@ 2023-05-23 10:15 ` Michael Tokarev
  22 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2023-05-23 10:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-stable, Kevin Wolf, Eric Blake, Michael Tokarev

From: Kevin Wolf <kwolf@redhat.com>

nbd_drained_poll() generally runs in the main thread, not whatever
iothread the NBD server coroutine is meant to run in, so it can't
directly reenter the coroutines to wake them up.

The code seems to have the right intention, it specifies the correct
AioContext when it calls qemu_aio_coroutine_enter(). However, this
functions doesn't schedule the coroutine to run in that AioContext, but
it assumes it is already called in the home thread of the AioContext.

To fix this, add a new thread-safe qio_channel_wake_read() that can be
called in the main thread to wake up the coroutine in its AioContext,
and use this in nbd_drained_poll().

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230517152834.277483-3-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 7c1f51bf38de8cea4ed5030467646c37b46edeb7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/include/io/channel.h b/include/io/channel.h
index 153fbd2904..2b905423a9 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -757,6 +757,16 @@ void qio_channel_detach_aio_context(QIOChannel *ioc);
 void coroutine_fn qio_channel_yield(QIOChannel *ioc,
                                     GIOCondition condition);
 
+/**
+ * qio_channel_wake_read:
+ * @ioc: the channel object
+ *
+ * If qio_channel_yield() is currently waiting for the channel to become
+ * readable, interrupt it and reenter immediately. This function is safe to call
+ * from any thread.
+ */
+void qio_channel_wake_read(QIOChannel *ioc);
+
 /**
  * qio_channel_wait:
  * @ioc: the channel object
diff --git a/io/channel.c b/io/channel.c
index a8c7f11649..3c9b7beb65 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "block/aio-wait.h"
 #include "io/channel.h"
 #include "qapi/error.h"
 #include "qemu/main-loop.h"
@@ -514,7 +515,11 @@ int qio_channel_flush(QIOChannel *ioc,
 static void qio_channel_restart_read(void *opaque)
 {
     QIOChannel *ioc = opaque;
-    Coroutine *co = ioc->read_coroutine;
+    Coroutine *co = qatomic_xchg(&ioc->read_coroutine, NULL);
+
+    if (!co) {
+        return;
+    }
 
     /* Assert that aio_co_wake() reenters the coroutine directly */
     assert(qemu_get_current_aio_context() ==
@@ -525,7 +530,11 @@ static void qio_channel_restart_read(void *opaque)
 static void qio_channel_restart_write(void *opaque)
 {
     QIOChannel *ioc = opaque;
-    Coroutine *co = ioc->write_coroutine;
+    Coroutine *co = qatomic_xchg(&ioc->write_coroutine, NULL);
+
+    if (!co) {
+        return;
+    }
 
     /* Assert that aio_co_wake() reenters the coroutine directly */
     assert(qemu_get_current_aio_context() ==
@@ -568,7 +577,11 @@ void qio_channel_detach_aio_context(QIOChannel *ioc)
 void coroutine_fn qio_channel_yield(QIOChannel *ioc,
                                     GIOCondition condition)
 {
+    AioContext *ioc_ctx = ioc->ctx ?: qemu_get_aio_context();
+
     assert(qemu_in_coroutine());
+    assert(in_aio_context_home_thread(ioc_ctx));
+
     if (condition == G_IO_IN) {
         assert(!ioc->read_coroutine);
         ioc->read_coroutine = qemu_coroutine_self();
@@ -580,18 +593,26 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
     }
     qio_channel_set_aio_fd_handlers(ioc);
     qemu_coroutine_yield();
+    assert(in_aio_context_home_thread(ioc_ctx));
 
     /* Allow interrupting the operation by reentering the coroutine other than
      * through the aio_fd_handlers. */
-    if (condition == G_IO_IN && ioc->read_coroutine) {
-        ioc->read_coroutine = NULL;
+    if (condition == G_IO_IN) {
+        assert(ioc->read_coroutine == NULL);
         qio_channel_set_aio_fd_handlers(ioc);
-    } else if (condition == G_IO_OUT && ioc->write_coroutine) {
-        ioc->write_coroutine = NULL;
+    } else if (condition == G_IO_OUT) {
+        assert(ioc->write_coroutine == NULL);
         qio_channel_set_aio_fd_handlers(ioc);
     }
 }
 
+void qio_channel_wake_read(QIOChannel *ioc)
+{
+    Coroutine *co = qatomic_xchg(&ioc->read_coroutine, NULL);
+    if (co) {
+        aio_co_wake(co);
+    }
+}
 
 static gboolean qio_channel_wait_complete(QIOChannel *ioc,
                                           GIOCondition condition,
diff --git a/nbd/server.c b/nbd/server.c
index 3d8d0d81df..ea47522e8f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1599,8 +1599,7 @@ static bool nbd_drained_poll(void *opaque)
              * enter it here so we don't depend on the client to wake it up.
              */
             if (client->recv_coroutine != NULL && client->read_yielding) {
-                qemu_aio_coroutine_enter(exp->common.ctx,
-                                         client->recv_coroutine);
+                qio_channel_wake_read(client->ioc);
             }
 
             return true;
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-05-23 10:20 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-23 10:14 [Stable-8.0.1 v2 00/59] Patch Round-up for stable 8.0.1, freeze on 2023-05-27 Michael Tokarev
2023-05-23 10:14 ` [Stable-8.0.1 35/59] s390x/tcg: Fix LDER instruction format Michael Tokarev
2023-05-23 10:14 ` [Stable-8.0.1 38/59] s390x/pv: Fix spurious warning with asynchronous teardown Michael Tokarev
2023-05-23 10:14 ` [Stable-8.0.1 39/59] util/async-teardown: wire up query-command-line-options Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 40/59] docs/about/emulation: fix typo Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 41/59] tests/docker: bump the xtensa base to debian:11-slim Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 42/59] linux-user: Fix mips fp64 executables loading Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 43/59] linux-user: fix getgroups/setgroups allocations Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 44/59] migration: Handle block device inactivation failures better Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 45/59] migration: Minor control flow simplification Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 46/59] migration: Attempt disk reactivation in more failure scenarios Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 47/59] target/arm: Fix vd == vm overlap in sve_ldff1_z Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 48/59] scsi-generic: fix buffer overflow on block limits inquiry Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 49/59] target/i386: fix operand size for VCOMI/VUCOMI instructions Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 50/59] target/i386: fix avx2 instructions vzeroall and vpermdq Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 51/59] vhost: fix possible wrap in SVQ descriptor ring Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 52/59] hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0 Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 53/59] virtio-net: not enable vq reset feature unconditionally Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 54/59] virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 55/59] aio-posix: do not nest poll handlers Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 56/59] tested: add test for nested aio_poll() in " Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 57/59] block: compile out assert_bdrv_graph_readable() by default Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 58/59] graph-lock: Disable locking for now Michael Tokarev
2023-05-23 10:15 ` [Stable-8.0.1 59/59] nbd/server: Fix drained_poll to wake coroutine in right AioContext Michael Tokarev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).