qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for 7.2-rc3  v1 0/2] virtio fixes
@ 2022-11-23 15:21 Alex Bennée
  2022-11-23 15:21 ` [PATCH v1 1/2] include/hw: attempt to document VirtIO feature variables Alex Bennée
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Alex Bennée @ 2022-11-23 15:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, Alex Bennée

Hi,

This hopefully fixes the problems with VirtIO migration caused by the
previous refactoring of virtio_device_started(). That introduced a
different order of checking which didn't give the VM state primacy but
wasn't noticed as we don't properly exercise VirtIO device migration
and caused issues when dev->started wasn't checked in the core code.
The introduction of virtio_device_should_start() split the overloaded
function up but the broken order still remained. The series finally
fixes that by restoring the original semantics but with the cleaned up
functions.

I've added more documentation to the various structures involved as
well as the functions. There is still some inconsistencies in the
VirtIO code between different devices but I think that can be looked
at over the 8.0 cycle.

Alex Bennée (2):
  include/hw: attempt to document VirtIO feature variables
  include/hw: VM state takes precedence in virtio_device_should_start

 include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
 include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
 2 files changed, 59 insertions(+), 9 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v1 1/2] include/hw: attempt to document VirtIO feature variables
  2022-11-23 15:21 [PATCH for 7.2-rc3 v1 0/2] virtio fixes Alex Bennée
@ 2022-11-23 15:21 ` Alex Bennée
  2022-11-23 15:21 ` [PATCH v1 2/2] include/hw: VM state takes precedence in virtio_device_should_start Alex Bennée
  2022-11-23 15:26 ` [PATCH for 7.2-rc3 v1 0/2] virtio fixes Michael S. Tsirkin
  2 siblings, 0 replies; 13+ messages in thread
From: Alex Bennée @ 2022-11-23 15:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, Alex Bennée, Stefano Garzarella,
	Stefan Hajnoczi

We have a bunch of variables associated with the device and the vhost
backend which are used inconsistently throughout the code base. Lets
start trying to bring some order by agreeing what each variable is
for.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>

---
v2
  - dropped DISCUSS and commentary
  - separated protocol section for clarity
  - updated working on vhost->backend_features
  - made clear guest_features was the written state
---
 include/hw/virtio/vhost.h  | 25 ++++++++++++++++++++++---
 include/hw/virtio/virtio.h | 19 ++++++++++++++++++-
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 353252ac3e..eaf628f656 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -88,13 +88,32 @@ struct vhost_dev {
     int vq_index_end;
     /* if non-zero, minimum required value for max_queues */
     int num_queues;
+    /**
+     * vhost feature handling requires matching the feature set
+     * offered by a backend which may be a subset of the total
+     * features eventually offered to the guest.
+     *
+     * @features: available features provided by the backend
+     * @acked_features: final negotiated features with front-end driver
+     *
+     * @backend_features: this is used in a couple of places to either
+     * store VHOST_USER_F_PROTOCOL_FEATURES to apply to
+     * VHOST_USER_SET_FEATURES or VHOST_NET_F_VIRTIO_NET_HDR. Its
+     * future use should be discouraged and the variable retired as
+     * its easy to confuse with the VirtIO backend_features.
+     */
     uint64_t features;
-    /** @acked_features: final set of negotiated features */
     uint64_t acked_features;
-    /** @backend_features: backend specific feature bits */
     uint64_t backend_features;
-    /** @protocol_features: final negotiated protocol features */
+
+    /**
+     * @protocol_features: is the vhost-user only feature set by
+     * VHOST_USER_SET_PROTOCOL_FEATURES. Protocol features are only
+     * negotiated if VHOST_USER_F_PROTOCOL_FEATURES has been offered
+     * by the backend (see @features).
+     */
     uint64_t protocol_features;
+
     uint64_t max_queues;
     uint64_t backend_cap;
     /* @started: is the vhost device started? */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index a973811cbf..0f612067f7 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -93,6 +93,12 @@ enum virtio_device_endian {
     VIRTIO_DEVICE_ENDIAN_BIG,
 };
 
+/**
+ * struct VirtIODevice - common VirtIO structure
+ * @name: name of the device
+ * @status: VirtIO Device Status field
+ *
+ */
 struct VirtIODevice
 {
     DeviceState parent_obj;
@@ -100,9 +106,20 @@ struct VirtIODevice
     uint8_t status;
     uint8_t isr;
     uint16_t queue_sel;
-    uint64_t guest_features;
+    /**
+     * These fields represent a set of VirtIO features at various
+     * levels of the stack. @host_features indicates the complete
+     * feature set the VirtIO device can offer to the driver.
+     * @guest_features indicates which features the VirtIO driver has
+     * selected by writing to the feature register. Finally
+     * @backend_features represents everything supported by the
+     * backend (e.g. vhost) and could potentially be a subset of the
+     * total feature set offered by QEMU.
+     */
     uint64_t host_features;
+    uint64_t guest_features;
     uint64_t backend_features;
+
     size_t config_len;
     void *config;
     uint16_t config_vector;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 2/2] include/hw: VM state takes precedence in virtio_device_should_start
  2022-11-23 15:21 [PATCH for 7.2-rc3 v1 0/2] virtio fixes Alex Bennée
  2022-11-23 15:21 ` [PATCH v1 1/2] include/hw: attempt to document VirtIO feature variables Alex Bennée
@ 2022-11-23 15:21 ` Alex Bennée
  2022-11-23 15:26 ` [PATCH for 7.2-rc3 v1 0/2] virtio fixes Michael S. Tsirkin
  2 siblings, 0 replies; 13+ messages in thread
From: Alex Bennée @ 2022-11-23 15:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, Alex Bennée, Christian Borntraeger

The VM status should always preempt the device status for these
checks. This ensures the device is in the correct state when we
suspend the VM prior to migrations. This restores the checks to the
order they where in before the refactoring moved things around.

While we are at it lets improve our documentation of the various
fields involved and document the two functions.

Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started)
Fixes: 259d69c00b (hw/virtio: introduce virtio_device_should_start)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
---
 include/hw/virtio/virtio.h | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 0f612067f7..48f539d0fe 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -133,6 +133,13 @@ struct VirtIODevice
     bool broken; /* device in invalid state, needs reset */
     bool use_disabled_flag; /* allow use of 'disable' flag when needed */
     bool disabled; /* device in temporarily disabled state */
+    /**
+     * @use_started: true if the @started flag should be used to check the
+     * current state of the VirtIO device. Otherwise status bits
+     * should be checked for a current status of the device.
+     * @use_started is only set via QMP and defaults to true for all
+     * modern machines (since 4.1).
+     */
     bool use_started;
     bool started;
     bool start_on_kick; /* when virtio 1.0 feature has not been negotiated */
@@ -408,6 +415,17 @@ static inline bool virtio_is_big_endian(VirtIODevice *vdev)
     return false;
 }
 
+
+/**
+ * virtio_device_should_start() - check if device started
+ * @vdev - the VirtIO device
+ * @status - the devices status bits
+ *
+ * Check if the device is started. For most modern machines this is
+ * tracked via the @vdev->started field (to support migration),
+ * otherwise we check for the final negotiated status bit that
+ * indicates everything is ready.
+ */
 static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status)
 {
     if (vdev->use_started) {
@@ -428,15 +446,11 @@ static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status)
  */
 static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status)
 {
-    if (vdev->use_started) {
-        return vdev->started;
-    }
-
     if (!vdev->vm_running) {
         return false;
     }
 
-    return status & VIRTIO_CONFIG_S_DRIVER_OK;
+    return virtio_device_started(vdev, status);
 }
 
 static inline void virtio_set_started(VirtIODevice *vdev, bool started)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH for 7.2-rc3  v1 0/2] virtio fixes
  2022-11-23 15:21 [PATCH for 7.2-rc3 v1 0/2] virtio fixes Alex Bennée
  2022-11-23 15:21 ` [PATCH v1 1/2] include/hw: attempt to document VirtIO feature variables Alex Bennée
  2022-11-23 15:21 ` [PATCH v1 2/2] include/hw: VM state takes precedence in virtio_device_should_start Alex Bennée
@ 2022-11-23 15:26 ` Michael S. Tsirkin
  2022-11-23 16:03   ` Alex Bennée
  2 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2022-11-23 15:26 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, slp, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar

On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
> Hi,
> 
> This hopefully fixes the problems with VirtIO migration caused by the
> previous refactoring of virtio_device_started(). That introduced a
> different order of checking which didn't give the VM state primacy but
> wasn't noticed as we don't properly exercise VirtIO device migration
> and caused issues when dev->started wasn't checked in the core code.
> The introduction of virtio_device_should_start() split the overloaded
> function up but the broken order still remained. The series finally
> fixes that by restoring the original semantics but with the cleaned up
> functions.
> 
> I've added more documentation to the various structures involved as
> well as the functions. There is still some inconsistencies in the
> VirtIO code between different devices but I think that can be looked
> at over the 8.0 cycle.


Thanks a lot! Did you try this with gitlab CI? A patch similar to your
2/2 broke it previously ...

> Alex Bennée (2):
>   include/hw: attempt to document VirtIO feature variables
>   include/hw: VM state takes precedence in virtio_device_should_start
> 
>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
>  2 files changed, 59 insertions(+), 9 deletions(-)
> 
> -- 
> 2.34.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 7.2-rc3  v1 0/2] virtio fixes
  2022-11-23 15:26 ` [PATCH for 7.2-rc3 v1 0/2] virtio fixes Michael S. Tsirkin
@ 2022-11-23 16:03   ` Alex Bennée
  2022-11-23 16:08     ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Bennée @ 2022-11-23 16:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, slp, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar


"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> Hi,
>> 
>> This hopefully fixes the problems with VirtIO migration caused by the
>> previous refactoring of virtio_device_started(). That introduced a
>> different order of checking which didn't give the VM state primacy but
>> wasn't noticed as we don't properly exercise VirtIO device migration
>> and caused issues when dev->started wasn't checked in the core code.
>> The introduction of virtio_device_should_start() split the overloaded
>> function up but the broken order still remained. The series finally
>> fixes that by restoring the original semantics but with the cleaned up
>> functions.
>> 
>> I've added more documentation to the various structures involved as
>> well as the functions. There is still some inconsistencies in the
>> VirtIO code between different devices but I think that can be looked
>> at over the 8.0 cycle.
>
>
> Thanks a lot! Did you try this with gitlab CI? A patch similar to your
> 2/2 broke it previously ...

Looking into it now - so far hasn't broken locally but I guess there is
something different about the CI.

>
>> Alex Bennée (2):
>>   include/hw: attempt to document VirtIO feature variables
>>   include/hw: VM state takes precedence in virtio_device_should_start
>> 
>>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
>>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
>>  2 files changed, 59 insertions(+), 9 deletions(-)
>> 
>> -- 
>> 2.34.1


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 7.2-rc3  v1 0/2] virtio fixes
  2022-11-23 16:03   ` Alex Bennée
@ 2022-11-23 16:08     ` Michael S. Tsirkin
  2022-11-24  9:21       ` Alex Bennée
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2022-11-23 16:08 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, slp, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar

On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
> 
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
> >> Hi,
> >> 
> >> This hopefully fixes the problems with VirtIO migration caused by the
> >> previous refactoring of virtio_device_started(). That introduced a
> >> different order of checking which didn't give the VM state primacy but
> >> wasn't noticed as we don't properly exercise VirtIO device migration
> >> and caused issues when dev->started wasn't checked in the core code.
> >> The introduction of virtio_device_should_start() split the overloaded
> >> function up but the broken order still remained. The series finally
> >> fixes that by restoring the original semantics but with the cleaned up
> >> functions.
> >> 
> >> I've added more documentation to the various structures involved as
> >> well as the functions. There is still some inconsistencies in the
> >> VirtIO code between different devices but I think that can be looked
> >> at over the 8.0 cycle.
> >
> >
> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
> > 2/2 broke it previously ...
> 
> Looking into it now - so far hasn't broken locally but I guess there is
> something different about the CI.


yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2

Or with QEMU_CI set to 1 and then run fedora container and then clang-system manually.

> >
> >> Alex Bennée (2):
> >>   include/hw: attempt to document VirtIO feature variables
> >>   include/hw: VM state takes precedence in virtio_device_should_start
> >> 
> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
> >>  2 files changed, 59 insertions(+), 9 deletions(-)
> >> 
> >> -- 
> >> 2.34.1
> 
> 
> -- 
> Alex Bennée



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 7.2-rc3  v1 0/2] virtio fixes
  2022-11-23 16:08     ` Michael S. Tsirkin
@ 2022-11-24  9:21       ` Alex Bennée
  2022-11-24 10:53         ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Bennée @ 2022-11-24  9:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, slp, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar


"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
>> 
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> >> Hi,
>> >> 
>> >> This hopefully fixes the problems with VirtIO migration caused by the
>> >> previous refactoring of virtio_device_started(). That introduced a
>> >> different order of checking which didn't give the VM state primacy but
>> >> wasn't noticed as we don't properly exercise VirtIO device migration
>> >> and caused issues when dev->started wasn't checked in the core code.
>> >> The introduction of virtio_device_should_start() split the overloaded
>> >> function up but the broken order still remained. The series finally
>> >> fixes that by restoring the original semantics but with the cleaned up
>> >> functions.
>> >> 
>> >> I've added more documentation to the various structures involved as
>> >> well as the functions. There is still some inconsistencies in the
>> >> VirtIO code between different devices but I think that can be looked
>> >> at over the 8.0 cycle.
>> >
>> >
>> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
>> > 2/2 broke it previously ...
>> 
>> Looking into it now - so far hasn't broken locally but I guess there is
>> something different about the CI.
>
>
> yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
>
> Or with QEMU_CI set to 1 and then run fedora container and then
> clang-system manually.

I'm having trouble re-creating the failures in CI locally on my boxen. I
have triggered a bug on s390 but that looks like a pre-existing problem
with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
think that is a limitation of the test harness.

Will keep looking.

>
>> >
>> >> Alex Bennée (2):
>> >>   include/hw: attempt to document VirtIO feature variables
>> >>   include/hw: VM state takes precedence in virtio_device_should_start
>> >> 
>> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
>> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
>> >>  2 files changed, 59 insertions(+), 9 deletions(-)
>> >> 
>> >> -- 
>> >> 2.34.1
>> 
>> 
>> -- 
>> Alex Bennée


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 7.2-rc3  v1 0/2] virtio fixes
  2022-11-24  9:21       ` Alex Bennée
@ 2022-11-24 10:53         ` Michael S. Tsirkin
  2022-11-24 13:11           ` Alex Bennée
  2022-11-24 22:24           ` Alex Bennée
  0 siblings, 2 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2022-11-24 10:53 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, slp, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar

On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
> 
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
> >> 
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> 
> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
> >> >> Hi,
> >> >> 
> >> >> This hopefully fixes the problems with VirtIO migration caused by the
> >> >> previous refactoring of virtio_device_started(). That introduced a
> >> >> different order of checking which didn't give the VM state primacy but
> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
> >> >> and caused issues when dev->started wasn't checked in the core code.
> >> >> The introduction of virtio_device_should_start() split the overloaded
> >> >> function up but the broken order still remained. The series finally
> >> >> fixes that by restoring the original semantics but with the cleaned up
> >> >> functions.
> >> >> 
> >> >> I've added more documentation to the various structures involved as
> >> >> well as the functions. There is still some inconsistencies in the
> >> >> VirtIO code between different devices but I think that can be looked
> >> >> at over the 8.0 cycle.
> >> >
> >> >
> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
> >> > 2/2 broke it previously ...
> >> 
> >> Looking into it now - so far hasn't broken locally but I guess there is
> >> something different about the CI.
> >
> >
> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
> >
> > Or with QEMU_CI set to 1 and then run fedora container and then
> > clang-system manually.
> 
> I'm having trouble re-creating the failures in CI locally on my boxen. I
> have triggered a bug on s390 but that looks like a pre-existing problem
> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
> think that is a limitation of the test harness.
> 
> Will keep looking.

Why not just trigger it on gitlab CI - it's very repeatable there?

> >
> >> >
> >> >> Alex Bennée (2):
> >> >>   include/hw: attempt to document VirtIO feature variables
> >> >>   include/hw: VM state takes precedence in virtio_device_should_start
> >> >> 
> >> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
> >> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
> >> >>  2 files changed, 59 insertions(+), 9 deletions(-)
> >> >> 
> >> >> -- 
> >> >> 2.34.1
> >> 
> >> 
> >> -- 
> >> Alex Bennée
> 
> 
> -- 
> Alex Bennée



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 7.2-rc3  v1 0/2] virtio fixes
  2022-11-24 10:53         ` Michael S. Tsirkin
@ 2022-11-24 13:11           ` Alex Bennée
  2022-11-24 22:24           ` Alex Bennée
  1 sibling, 0 replies; 13+ messages in thread
From: Alex Bennée @ 2022-11-24 13:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, slp, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar


"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
>> 
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
>> >> 
>> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> 
>> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> >> >> Hi,
>> >> >> 
>> >> >> This hopefully fixes the problems with VirtIO migration caused by the
>> >> >> previous refactoring of virtio_device_started(). That introduced a
>> >> >> different order of checking which didn't give the VM state primacy but
>> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
>> >> >> and caused issues when dev->started wasn't checked in the core code.
>> >> >> The introduction of virtio_device_should_start() split the overloaded
>> >> >> function up but the broken order still remained. The series finally
>> >> >> fixes that by restoring the original semantics but with the cleaned up
>> >> >> functions.
>> >> >> 
>> >> >> I've added more documentation to the various structures involved as
>> >> >> well as the functions. There is still some inconsistencies in the
>> >> >> VirtIO code between different devices but I think that can be looked
>> >> >> at over the 8.0 cycle.
>> >> >
>> >> >
>> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
>> >> > 2/2 broke it previously ...
>> >> 
>> >> Looking into it now - so far hasn't broken locally but I guess there is
>> >> something different about the CI.
>> >
>> >
>> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
>> >
>> > Or with QEMU_CI set to 1 and then run fedora container and then
>> > clang-system manually.
>> 
>> I'm having trouble re-creating the failures in CI locally on my boxen. I
>> have triggered a bug on s390 but that looks like a pre-existing problem
>> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
>> think that is a limitation of the test harness.
>> 
>> Will keep looking.
>
> Why not just trigger it on gitlab CI - it's very repeatable there?

I've got a fix for gpio and am running it through CI now:

  https://gitlab.com/stsquad/qemu/-/pipelines/704285944

My main concern is I had to do something no other vhost-user device does
and I'm not sure if thats down to misunderstanding or the other devices
just getting lucky.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 7.2-rc3  v1 0/2] virtio fixes
  2022-11-24 10:53         ` Michael S. Tsirkin
  2022-11-24 13:11           ` Alex Bennée
@ 2022-11-24 22:24           ` Alex Bennée
  2022-11-25 12:05             ` Michael S. Tsirkin
  1 sibling, 1 reply; 13+ messages in thread
From: Alex Bennée @ 2022-11-24 22:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, slp, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar


"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
>> 
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
>> >> 
>> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> 
>> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> >> >> Hi,
>> >> >> 
>> >> >> This hopefully fixes the problems with VirtIO migration caused by the
>> >> >> previous refactoring of virtio_device_started(). That introduced a
>> >> >> different order of checking which didn't give the VM state primacy but
>> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
>> >> >> and caused issues when dev->started wasn't checked in the core code.
>> >> >> The introduction of virtio_device_should_start() split the overloaded
>> >> >> function up but the broken order still remained. The series finally
>> >> >> fixes that by restoring the original semantics but with the cleaned up
>> >> >> functions.
>> >> >> 
>> >> >> I've added more documentation to the various structures involved as
>> >> >> well as the functions. There is still some inconsistencies in the
>> >> >> VirtIO code between different devices but I think that can be looked
>> >> >> at over the 8.0 cycle.
>> >> >
>> >> >
>> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
>> >> > 2/2 broke it previously ...
>> >> 
>> >> Looking into it now - so far hasn't broken locally but I guess there is
>> >> something different about the CI.
>> >
>> >
>> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
>> >
>> > Or with QEMU_CI set to 1 and then run fedora container and then
>> > clang-system manually.
>> 
>> I'm having trouble re-creating the failures in CI locally on my boxen. I
>> have triggered a bug on s390 but that looks like a pre-existing problem
>> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
>> think that is a limitation of the test harness.
>> 
>> Will keep looking.
>
> Why not just trigger it on gitlab CI - it's very repeatable there?

I can repeat a problem locally on Debian Bullseye and Ubuntu 22.04 with clang and leak sanitizer:

  # QEMU configure log Thu 24 Nov 16:02:56 GMT 2022
  # Configured with: '../../configure' '--cc=clang' '--cxx=clang++' '--enable-sanitizers' '--target-list=arm-softmmu,aarch64-softmmu,i386-softmmu,x86_64-softmmu,ppc64-softmmu'#

And the command:

  env QTEST_QEMU_BINARY=./qemu-system-arm QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=178 G_TEST_DBUS_DAEMON=/home/alex.bennee/lsrc/qemu.git/tests/dbus-vmstate-daemon.sh ./tests/qtest/qos-test -p /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess

Gives the following failure, while a leak may not be that exciting it
does point to a potential corruption issue. Unfortunately I don't get a
decent backtrace from the tool:

  # random seed: R02S071fe8d68317a8b01e5e7fadbf1ac60a
  # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine none -accel qtest
  ==1024354==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
  # Start of arm tests
  # Start of virt tests
  # Start of virtio-mmio tests
  # Start of virtio-bus tests
  # Start of vhost-user-gpio-device tests
  # Start of vhost-user-gpio tests
  # Start of vhost-user-gpio-tests tests
  # Start of read-guest-mem tests
  # Start of memfile tests
  # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -M virt  -device vhost-user-gpio-device,id=gpio0,chardev=chr-vhost-user-test -m 256 -object memory-backend-memfd,id=mem,size=256M, -numa node,memdev=mem -chardev socket,id=chr-vhost-user-test,path=/tmp/vhost-test-8DD2V1/vhost-user-test.sock -accel qtest
  # GLib-DEBUG: setenv()/putenv() are not thread-safe and should not be used after threads are created
  ==1024371==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
  # set_protocol_features: 0x200
  # set_owner: start of session
  # vhost-user: un-handled message: 14
  # vhost-user: un-handled message: 14
  # set_vring_num: 0/1024
  qemu-system-arm: Failed to set msg fds.
  qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
  qemu-system-arm: Failed to set msg fds.
  qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
  qemu-system-arm: Failed to set msg fds.
  qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
  qemu-system-arm: Failed to set msg fds.
  qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
  ok 1 /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess # SKIP No memory at address 0x0
  # End of memfile tests
  # End of read-guest-mem tests
  # End of vhost-user-gpio-tests tests
  # End of vhost-user-gpio tests
  # End of vhost-user-gpio-device tests
  # End of virtio-bus tests
  # End of virtio-mmio tests
  # End of virt tests
  # End of arm tests
  1..1

  =================================================================
  ==1024371==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 240 byte(s) in 1 object(s) allocated from:
      #0 0x561d9a5d7a18 in __interceptor_calloc (/home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/qemu-system-arm+0x1d1fa18) (BuildId: 0bdc7c2ada2277089db16d57f17c314e9e53e41c)
      #1 0x7f46ee656c40 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5ec40) (BuildId: 0ab0b740e34eeb0c84656ba53737f4c440dfbed4)
      #2 0x561d9bf7875b in virtio_device_realize /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/virtio/virtio.c:4175:9
      #3 0x561d9c321bf4 in device_set_realized /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/core/qdev.c:566:13
      #4 0x561d9c33dda8 in property_set_bool /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:2285:5
      #5 0x561d9c338fb3 in object_property_set /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:1420:5
      #6 0x561d9c344c7c in object_property_set_qobject /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/qom-qobject.c:28:10
      #7 0x561d9b367954 in qdev_device_add /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/qdev-monitor.c:733:11
      #8 0x561d9b36f832 in qemu_create_cli_devices /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2536:5
      #9 0x561d9b36f832 in qmp_x_exit_preconfig /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2604:5
      #10 0x561d9b37613f in qemu_init /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:3601:9
      #11 0x561d9a6125a5 in main /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/main.c:47:5

  SUMMARY: AddressSanitizer: 240 byte(s) leaked in 1 allocation(s).
  ../../tests/qtest/libqtest.c:179: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
  fish: Job 1, 'env QTEST_QEMU_BINARY=./qemu-sy…' terminated by signal SIGABRT (Abort)
  🕙22:26:18 alex.bennee@hackbox2:qemu.git/builds/all.clang-sanitizers  on  for-7.2/virtio-fixes [$?] [⚡ IOT]
  ✗



>> >
>> >> >
>> >> >> Alex Bennée (2):
>> >> >>   include/hw: attempt to document VirtIO feature variables
>> >> >>   include/hw: VM state takes precedence in virtio_device_should_start
>> >> >> 
>> >> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
>> >> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
>> >> >>  2 files changed, 59 insertions(+), 9 deletions(-)
>> >> >> 
>> >> >> -- 
>> >> >> 2.34.1
>> >> 
>> >> 
>> >> -- 
>> >> Alex Bennée
>> 
>> 
>> -- 
>> Alex Bennée


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 7.2-rc3  v1 0/2] virtio fixes
  2022-11-24 22:24           ` Alex Bennée
@ 2022-11-25 12:05             ` Michael S. Tsirkin
  2022-11-25 14:51               ` Alex Bennée
  2022-11-25 15:14               ` Alex Bennée
  0 siblings, 2 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2022-11-25 12:05 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, slp, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar

On Thu, Nov 24, 2022 at 10:24:14PM +0000, Alex Bennée wrote:
> 
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
> >> 
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> 
> >> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
> >> >> 
> >> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> >> 
> >> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
> >> >> >> Hi,
> >> >> >> 
> >> >> >> This hopefully fixes the problems with VirtIO migration caused by the
> >> >> >> previous refactoring of virtio_device_started(). That introduced a
> >> >> >> different order of checking which didn't give the VM state primacy but
> >> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
> >> >> >> and caused issues when dev->started wasn't checked in the core code.
> >> >> >> The introduction of virtio_device_should_start() split the overloaded
> >> >> >> function up but the broken order still remained. The series finally
> >> >> >> fixes that by restoring the original semantics but with the cleaned up
> >> >> >> functions.
> >> >> >> 
> >> >> >> I've added more documentation to the various structures involved as
> >> >> >> well as the functions. There is still some inconsistencies in the
> >> >> >> VirtIO code between different devices but I think that can be looked
> >> >> >> at over the 8.0 cycle.
> >> >> >
> >> >> >
> >> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
> >> >> > 2/2 broke it previously ...
> >> >> 
> >> >> Looking into it now - so far hasn't broken locally but I guess there is
> >> >> something different about the CI.
> >> >
> >> >
> >> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
> >> >
> >> > Or with QEMU_CI set to 1 and then run fedora container and then
> >> > clang-system manually.
> >> 
> >> I'm having trouble re-creating the failures in CI locally on my boxen. I
> >> have triggered a bug on s390 but that looks like a pre-existing problem
> >> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
> >> think that is a limitation of the test harness.
> >> 
> >> Will keep looking.
> >
> > Why not just trigger it on gitlab CI - it's very repeatable there?
> 
> I can repeat a problem locally on Debian Bullseye and Ubuntu 22.04 with clang and leak sanitizer:
> 
>   # QEMU configure log Thu 24 Nov 16:02:56 GMT 2022
>   # Configured with: '../../configure' '--cc=clang' '--cxx=clang++' '--enable-sanitizers' '--target-list=arm-softmmu,aarch64-softmmu,i386-softmmu,x86_64-softmmu,ppc64-softmmu'#
> 
> And the command:
> 
>   env QTEST_QEMU_BINARY=./qemu-system-arm QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=178 G_TEST_DBUS_DAEMON=/home/alex.bennee/lsrc/qemu.git/tests/dbus-vmstate-daemon.sh ./tests/qtest/qos-test -p /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess
> 
> Gives the following failure, while a leak may not be that exciting it
> does point to a potential corruption issue. Unfortunately I don't get a
> decent backtrace from the tool:
> 
>   # random seed: R02S071fe8d68317a8b01e5e7fadbf1ac60a
>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine none -accel qtest
>   ==1024354==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>   # Start of arm tests
>   # Start of virt tests
>   # Start of virtio-mmio tests
>   # Start of virtio-bus tests
>   # Start of vhost-user-gpio-device tests
>   # Start of vhost-user-gpio tests
>   # Start of vhost-user-gpio-tests tests
>   # Start of read-guest-mem tests
>   # Start of memfile tests
>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -M virt  -device vhost-user-gpio-device,id=gpio0,chardev=chr-vhost-user-test -m 256 -object memory-backend-memfd,id=mem,size=256M, -numa node,memdev=mem -chardev socket,id=chr-vhost-user-test,path=/tmp/vhost-test-8DD2V1/vhost-user-test.sock -accel qtest
>   # GLib-DEBUG: setenv()/putenv() are not thread-safe and should not be used after threads are created
>   ==1024371==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>   # set_protocol_features: 0x200
>   # set_owner: start of session
>   # vhost-user: un-handled message: 14
>   # vhost-user: un-handled message: 14
>   # set_vring_num: 0/1024
>   qemu-system-arm: Failed to set msg fds.
>   qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
>   qemu-system-arm: Failed to set msg fds.
>   qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
>   qemu-system-arm: Failed to set msg fds.
>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>   qemu-system-arm: Failed to set msg fds.
>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>   ok 1 /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess # SKIP No memory at address 0x0
>   # End of memfile tests
>   # End of read-guest-mem tests
>   # End of vhost-user-gpio-tests tests
>   # End of vhost-user-gpio tests
>   # End of vhost-user-gpio-device tests
>   # End of virtio-bus tests
>   # End of virtio-mmio tests
>   # End of virt tests
>   # End of arm tests
>   1..1
> 
>   =================================================================
>   ==1024371==ERROR: LeakSanitizer: detected memory leaks
> 
>   Direct leak of 240 byte(s) in 1 object(s) allocated from:
>       #0 0x561d9a5d7a18 in __interceptor_calloc (/home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/qemu-system-arm+0x1d1fa18) (BuildId: 0bdc7c2ada2277089db16d57f17c314e9e53e41c)
>       #1 0x7f46ee656c40 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5ec40) (BuildId: 0ab0b740e34eeb0c84656ba53737f4c440dfbed4)
>       #2 0x561d9bf7875b in virtio_device_realize /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/virtio/virtio.c:4175:9
>       #3 0x561d9c321bf4 in device_set_realized /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/core/qdev.c:566:13
>       #4 0x561d9c33dda8 in property_set_bool /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:2285:5
>       #5 0x561d9c338fb3 in object_property_set /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:1420:5
>       #6 0x561d9c344c7c in object_property_set_qobject /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/qom-qobject.c:28:10
>       #7 0x561d9b367954 in qdev_device_add /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/qdev-monitor.c:733:11
>       #8 0x561d9b36f832 in qemu_create_cli_devices /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2536:5
>       #9 0x561d9b36f832 in qmp_x_exit_preconfig /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2604:5
>       #10 0x561d9b37613f in qemu_init /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:3601:9
>       #11 0x561d9a6125a5 in main /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/main.c:47:5
> 
>   SUMMARY: AddressSanitizer: 240 byte(s) leaked in 1 allocation(s).
>   ../../tests/qtest/libqtest.c:179: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
>   fish: Job 1, 'env QTEST_QEMU_BINARY=./qemu-sy…' terminated by signal SIGABRT (Abort)
>   🕙22:26:18 alex.bennee@hackbox2:qemu.git/builds/all.clang-sanitizers  on  for-7.2/virtio-fixes [$?] [⚡ IOT]
>   ✗


ok ... was gpio always like this? from 1st commit? if not bisect?

> 
> 
> >> >
> >> >> >
> >> >> >> Alex Bennée (2):
> >> >> >>   include/hw: attempt to document VirtIO feature variables
> >> >> >>   include/hw: VM state takes precedence in virtio_device_should_start
> >> >> >> 
> >> >> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
> >> >> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
> >> >> >>  2 files changed, 59 insertions(+), 9 deletions(-)
> >> >> >> 
> >> >> >> -- 
> >> >> >> 2.34.1
> >> >> 
> >> >> 
> >> >> -- 
> >> >> Alex Bennée
> >> 
> >> 
> >> -- 
> >> Alex Bennée
> 
> 
> -- 
> Alex Bennée



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 7.2-rc3  v1 0/2] virtio fixes
  2022-11-25 12:05             ` Michael S. Tsirkin
@ 2022-11-25 14:51               ` Alex Bennée
  2022-11-25 15:14               ` Alex Bennée
  1 sibling, 0 replies; 13+ messages in thread
From: Alex Bennée @ 2022-11-25 14:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, slp, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar


"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Nov 24, 2022 at 10:24:14PM +0000, Alex Bennée wrote:
>> 
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
>> >> 
>> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> 
>> >> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
>> >> >> 
>> >> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> >> 
>> >> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> >> >> >> Hi,
>> >> >> >> 
>> >> >> >> This hopefully fixes the problems with VirtIO migration caused by the
>> >> >> >> previous refactoring of virtio_device_started(). That introduced a
>> >> >> >> different order of checking which didn't give the VM state primacy but
>> >> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
>> >> >> >> and caused issues when dev->started wasn't checked in the core code.
>> >> >> >> The introduction of virtio_device_should_start() split the overloaded
>> >> >> >> function up but the broken order still remained. The series finally
>> >> >> >> fixes that by restoring the original semantics but with the cleaned up
>> >> >> >> functions.
>> >> >> >> 
>> >> >> >> I've added more documentation to the various structures involved as
>> >> >> >> well as the functions. There is still some inconsistencies in the
>> >> >> >> VirtIO code between different devices but I think that can be looked
>> >> >> >> at over the 8.0 cycle.
>> >> >> >
>> >> >> >
>> >> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
>> >> >> > 2/2 broke it previously ...
>> >> >> 
>> >> >> Looking into it now - so far hasn't broken locally but I guess there is
>> >> >> something different about the CI.
>> >> >
>> >> >
>> >> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
>> >> >
>> >> > Or with QEMU_CI set to 1 and then run fedora container and then
>> >> > clang-system manually.
>> >> 
>> >> I'm having trouble re-creating the failures in CI locally on my boxen. I
>> >> have triggered a bug on s390 but that looks like a pre-existing problem
>> >> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
>> >> think that is a limitation of the test harness.
>> >> 
>> >> Will keep looking.
>> >
>> > Why not just trigger it on gitlab CI - it's very repeatable there?
>> 
>> I can repeat a problem locally on Debian Bullseye and Ubuntu 22.04 with clang and leak sanitizer:
>> 
>>   # QEMU configure log Thu 24 Nov 16:02:56 GMT 2022
>>   # Configured with: '../../configure' '--cc=clang' '--cxx=clang++' '--enable-sanitizers' '--target-list=arm-softmmu,aarch64-softmmu,i386-softmmu,x86_64-softmmu,ppc64-softmmu'#
>> 
>> And the command:
>> 
>>   env QTEST_QEMU_BINARY=./qemu-system-arm QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=178 G_TEST_DBUS_DAEMON=/home/alex.bennee/lsrc/qemu.git/tests/dbus-vmstate-daemon.sh ./tests/qtest/qos-test -p /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess
>> 
>> Gives the following failure, while a leak may not be that exciting it
>> does point to a potential corruption issue. Unfortunately I don't get a
>> decent backtrace from the tool:
>> 
>>   # random seed: R02S071fe8d68317a8b01e5e7fadbf1ac60a
>>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine none -accel qtest
>>   ==1024354==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>>   # Start of arm tests
>>   # Start of virt tests
>>   # Start of virtio-mmio tests
>>   # Start of virtio-bus tests
>>   # Start of vhost-user-gpio-device tests
>>   # Start of vhost-user-gpio tests
>>   # Start of vhost-user-gpio-tests tests
>>   # Start of read-guest-mem tests
>>   # Start of memfile tests
>>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -M virt  -device vhost-user-gpio-device,id=gpio0,chardev=chr-vhost-user-test -m 256 -object memory-backend-memfd,id=mem,size=256M, -numa node,memdev=mem -chardev socket,id=chr-vhost-user-test,path=/tmp/vhost-test-8DD2V1/vhost-user-test.sock -accel qtest
>>   # GLib-DEBUG: setenv()/putenv() are not thread-safe and should not be used after threads are created
>>   ==1024371==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>>   # set_protocol_features: 0x200
>>   # set_owner: start of session
>>   # vhost-user: un-handled message: 14
>>   # vhost-user: un-handled message: 14
>>   # set_vring_num: 0/1024
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>>   ok 1 /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess # SKIP No memory at address 0x0
>>   # End of memfile tests
>>   # End of read-guest-mem tests
>>   # End of vhost-user-gpio-tests tests
>>   # End of vhost-user-gpio tests
>>   # End of vhost-user-gpio-device tests
>>   # End of virtio-bus tests
>>   # End of virtio-mmio tests
>>   # End of virt tests
>>   # End of arm tests
>>   1..1
>> 
>>   =================================================================
>>   ==1024371==ERROR: LeakSanitizer: detected memory leaks
>> 
>>   Direct leak of 240 byte(s) in 1 object(s) allocated from:
>>       #0 0x561d9a5d7a18 in __interceptor_calloc (/home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/qemu-system-arm+0x1d1fa18) (BuildId: 0bdc7c2ada2277089db16d57f17c314e9e53e41c)
>>       #1 0x7f46ee656c40 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5ec40) (BuildId: 0ab0b740e34eeb0c84656ba53737f4c440dfbed4)
>>       #2 0x561d9bf7875b in virtio_device_realize /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/virtio/virtio.c:4175:9
>>       #3 0x561d9c321bf4 in device_set_realized /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/core/qdev.c:566:13
>>       #4 0x561d9c33dda8 in property_set_bool /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:2285:5
>>       #5 0x561d9c338fb3 in object_property_set /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:1420:5
>>       #6 0x561d9c344c7c in object_property_set_qobject /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/qom-qobject.c:28:10
>>       #7 0x561d9b367954 in qdev_device_add /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/qdev-monitor.c:733:11
>>       #8 0x561d9b36f832 in qemu_create_cli_devices /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2536:5
>>       #9 0x561d9b36f832 in qmp_x_exit_preconfig /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2604:5
>>       #10 0x561d9b37613f in qemu_init /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:3601:9
>>       #11 0x561d9a6125a5 in main /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/main.c:47:5
>> 
>>   SUMMARY: AddressSanitizer: 240 byte(s) leaked in 1 allocation(s).
>>   ../../tests/qtest/libqtest.c:179: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
>>   fish: Job 1, 'env QTEST_QEMU_BINARY=./qemu-sy…' terminated by signal SIGABRT (Abort)
>>   🕙22:26:18 alex.bennee@hackbox2:qemu.git/builds/all.clang-sanitizers  on  for-7.2/virtio-fixes [$?] [⚡ IOT]
>>   ✗
>
>
> ok ... was gpio always like this? from 1st commit? if not bisect?

I think so. I've managed to track down what the malloc is. It's the
memory allocated by:

  gpio->vhost_dev.vqs = g_new0(struct vhost_virtqueue, gpio->vhost_dev.nvqs);

It is never freed because we never get to vu_gpio_device_unrealize() but
as far as I can tell none of the qtests ever trigger the unrealize() so
I'm not sure why it is special.

>
>> 
>> 
>> >> >
>> >> >> >
>> >> >> >> Alex Bennée (2):
>> >> >> >>   include/hw: attempt to document VirtIO feature variables
>> >> >> >>   include/hw: VM state takes precedence in virtio_device_should_start
>> >> >> >> 
>> >> >> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
>> >> >> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
>> >> >> >>  2 files changed, 59 insertions(+), 9 deletions(-)
>> >> >> >> 
>> >> >> >> -- 
>> >> >> >> 2.34.1
>> >> >> 
>> >> >> 
>> >> >> -- 
>> >> >> Alex Bennée
>> >> 
>> >> 
>> >> -- 
>> >> Alex Bennée
>> 
>> 
>> -- 
>> Alex Bennée


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 7.2-rc3  v1 0/2] virtio fixes
  2022-11-25 12:05             ` Michael S. Tsirkin
  2022-11-25 14:51               ` Alex Bennée
@ 2022-11-25 15:14               ` Alex Bennée
  1 sibling, 0 replies; 13+ messages in thread
From: Alex Bennée @ 2022-11-25 15:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, slp, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar


"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Nov 24, 2022 at 10:24:14PM +0000, Alex Bennée wrote:
>> 
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
>> >> 
>> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> 
>> >> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
>> >> >> 
>> >> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> >> 
>> >> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> >> >> >> Hi,
>> >> >> >> 
>> >> >> >> This hopefully fixes the problems with VirtIO migration caused by the
>> >> >> >> previous refactoring of virtio_device_started(). That introduced a
>> >> >> >> different order of checking which didn't give the VM state primacy but
>> >> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
>> >> >> >> and caused issues when dev->started wasn't checked in the core code.
>> >> >> >> The introduction of virtio_device_should_start() split the overloaded
>> >> >> >> function up but the broken order still remained. The series finally
>> >> >> >> fixes that by restoring the original semantics but with the cleaned up
>> >> >> >> functions.
>> >> >> >> 
>> >> >> >> I've added more documentation to the various structures involved as
>> >> >> >> well as the functions. There is still some inconsistencies in the
>> >> >> >> VirtIO code between different devices but I think that can be looked
>> >> >> >> at over the 8.0 cycle.
>> >> >> >
>> >> >> >
>> >> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
>> >> >> > 2/2 broke it previously ...
>> >> >> 
>> >> >> Looking into it now - so far hasn't broken locally but I guess there is
>> >> >> something different about the CI.
>> >> >
>> >> >
>> >> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
>> >> >
>> >> > Or with QEMU_CI set to 1 and then run fedora container and then
>> >> > clang-system manually.
>> >> 
>> >> I'm having trouble re-creating the failures in CI locally on my boxen. I
>> >> have triggered a bug on s390 but that looks like a pre-existing problem
>> >> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
>> >> think that is a limitation of the test harness.
>> >> 
>> >> Will keep looking.
>> >
>> > Why not just trigger it on gitlab CI - it's very repeatable there?
>> 
>> I can repeat a problem locally on Debian Bullseye and Ubuntu 22.04 with clang and leak sanitizer:
>> 
>>   # QEMU configure log Thu 24 Nov 16:02:56 GMT 2022
>>   # Configured with: '../../configure' '--cc=clang' '--cxx=clang++' '--enable-sanitizers' '--target-list=arm-softmmu,aarch64-softmmu,i386-softmmu,x86_64-softmmu,ppc64-softmmu'#
>> 
>> And the command:
>> 
>>   env QTEST_QEMU_BINARY=./qemu-system-arm QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=178 G_TEST_DBUS_DAEMON=/home/alex.bennee/lsrc/qemu.git/tests/dbus-vmstate-daemon.sh ./tests/qtest/qos-test -p /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess
>> 
>> Gives the following failure, while a leak may not be that exciting it
>> does point to a potential corruption issue. Unfortunately I don't get a
>> decent backtrace from the tool:
>> 
>>   # random seed: R02S071fe8d68317a8b01e5e7fadbf1ac60a
>>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine none -accel qtest
>>   ==1024354==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>>   # Start of arm tests
>>   # Start of virt tests
>>   # Start of virtio-mmio tests
>>   # Start of virtio-bus tests
>>   # Start of vhost-user-gpio-device tests
>>   # Start of vhost-user-gpio tests
>>   # Start of vhost-user-gpio-tests tests
>>   # Start of read-guest-mem tests
>>   # Start of memfile tests
>>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -M virt  -device vhost-user-gpio-device,id=gpio0,chardev=chr-vhost-user-test -m 256 -object memory-backend-memfd,id=mem,size=256M, -numa node,memdev=mem -chardev socket,id=chr-vhost-user-test,path=/tmp/vhost-test-8DD2V1/vhost-user-test.sock -accel qtest
>>   # GLib-DEBUG: setenv()/putenv() are not thread-safe and should not be used after threads are created
>>   ==1024371==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>>   # set_protocol_features: 0x200
>>   # set_owner: start of session
>>   # vhost-user: un-handled message: 14
>>   # vhost-user: un-handled message: 14
>>   # set_vring_num: 0/1024
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>>   ok 1 /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess # SKIP No memory at address 0x0
>>   # End of memfile tests
>>   # End of read-guest-mem tests
>>   # End of vhost-user-gpio-tests tests
>>   # End of vhost-user-gpio tests
>>   # End of vhost-user-gpio-device tests
>>   # End of virtio-bus tests
>>   # End of virtio-mmio tests
>>   # End of virt tests
>>   # End of arm tests
>>   1..1
>> 
>>   =================================================================
>>   ==1024371==ERROR: LeakSanitizer: detected memory leaks
>> 
>>   Direct leak of 240 byte(s) in 1 object(s) allocated from:
>>       #0 0x561d9a5d7a18 in __interceptor_calloc (/home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/qemu-system-arm+0x1d1fa18) (BuildId: 0bdc7c2ada2277089db16d57f17c314e9e53e41c)
>>       #1 0x7f46ee656c40 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5ec40) (BuildId: 0ab0b740e34eeb0c84656ba53737f4c440dfbed4)
>>       #2 0x561d9bf7875b in virtio_device_realize /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/virtio/virtio.c:4175:9
>>       #3 0x561d9c321bf4 in device_set_realized /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/core/qdev.c:566:13
>>       #4 0x561d9c33dda8 in property_set_bool /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:2285:5
>>       #5 0x561d9c338fb3 in object_property_set /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:1420:5
>>       #6 0x561d9c344c7c in object_property_set_qobject /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/qom-qobject.c:28:10
>>       #7 0x561d9b367954 in qdev_device_add /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/qdev-monitor.c:733:11
>>       #8 0x561d9b36f832 in qemu_create_cli_devices /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2536:5
>>       #9 0x561d9b36f832 in qmp_x_exit_preconfig /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2604:5
>>       #10 0x561d9b37613f in qemu_init /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:3601:9
>>       #11 0x561d9a6125a5 in main /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/main.c:47:5
>> 
>>   SUMMARY: AddressSanitizer: 240 byte(s) leaked in 1 allocation(s).
>>   ../../tests/qtest/libqtest.c:179: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
>>   fish: Job 1, 'env QTEST_QEMU_BINARY=./qemu-sy…' terminated by signal SIGABRT (Abort)
>>   🕙22:26:18 alex.bennee@hackbox2:qemu.git/builds/all.clang-sanitizers  on  for-7.2/virtio-fixes [$?] [⚡ IOT]
>>   ✗
>
>
> ok ... was gpio always like this? from 1st commit? if not bisect?
<snip>

I'm almost tempted to drop the mmio variant of the gpio test. I suspect
there is a reason the only other mmio vhost-user test is for virtio-net.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-11-25 15:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-23 15:21 [PATCH for 7.2-rc3 v1 0/2] virtio fixes Alex Bennée
2022-11-23 15:21 ` [PATCH v1 1/2] include/hw: attempt to document VirtIO feature variables Alex Bennée
2022-11-23 15:21 ` [PATCH v1 2/2] include/hw: VM state takes precedence in virtio_device_should_start Alex Bennée
2022-11-23 15:26 ` [PATCH for 7.2-rc3 v1 0/2] virtio fixes Michael S. Tsirkin
2022-11-23 16:03   ` Alex Bennée
2022-11-23 16:08     ` Michael S. Tsirkin
2022-11-24  9:21       ` Alex Bennée
2022-11-24 10:53         ` Michael S. Tsirkin
2022-11-24 13:11           ` Alex Bennée
2022-11-24 22:24           ` Alex Bennée
2022-11-25 12:05             ` Michael S. Tsirkin
2022-11-25 14:51               ` Alex Bennée
2022-11-25 15:14               ` Alex Bennée

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).