[PATCH v2 1/1] io: make zerocopy fallback accounting more accurate

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
@ 2026-03-09  9:09 Tejus GK
  2026-03-09 16:48 ` Daniel P. Berrangé
  2026-03-10  8:52 ` Markus Armbruster
  0 siblings, 2 replies; 15+ messages in thread
From: Tejus GK @ 2026-03-09  9:09 UTC (permalink / raw)
  To: qemu-devel, Daniel P. Berrangé, Peter Xu, Fabiano Rosas,
	Eric Blake, Markus Armbruster
  Cc: Tejus GK

Currently, the dirty-sync-missed-zero-copy stat is incremented only when
an entire batch of IO operations fails to use zerocopy and falls back to
a normal copy. As long as at least one IO in the batch is successfully
zero-copied, the whole batch is treated as a success. This hides
individual IO fallbacks and makes the migration stat less accurate than
it could be.

Make the stat more accurate by reporting at a finer granularity, i.e, by
incrementing for every individual IO fallback that occurs.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
---
 include/io/channel-socket.h |  6 +-----
 include/io/channel.h        |  5 ++---
 io/channel-socket.c         | 13 ++++---------
 migration/multifd.c         |  8 +++++---
 qapi/migration.json         |  3 +--
 5 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index a1ef3136ea..5c5714668e 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -50,11 +50,7 @@ struct QIOChannelSocket {
     ssize_t zero_copy_queued;
     ssize_t zero_copy_sent;
     bool blocking;
-    /**
-     * This flag indicates whether any new data was successfully sent with
-     * zerocopy since the last qio_channel_socket_flush() call.
-     */
-    bool new_zero_copy_sent_success;
+    ssize_t zero_copy_fallback;
 };
 
 
diff --git a/include/io/channel.h b/include/io/channel.h
index 1b02350437..70f701ae16 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -1013,9 +1013,8 @@ int coroutine_mixed_fn qio_channel_writev_full_all(QIOChannel *ioc,
  *
  * If not implemented, acts as a no-op, and returns 0.
  *
- * Returns -1 if any error is found,
- *          1 if every send failed to use zero copy.
- *          0 otherwise.
+ * Returns -1 if any error is found, otherwise returns a non-negative number
+ * indicating the number of IO requests that fell back to copy.
  */
 
 int qio_channel_flush(QIOChannel *ioc,
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 3053b35ad8..08b9862074 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -72,7 +72,7 @@ qio_channel_socket_new(void)
     sioc->zero_copy_queued = 0;
     sioc->zero_copy_sent = 0;
     sioc->blocking = false;
-    sioc->new_zero_copy_sent_success = false;
+    sioc->zero_copy_fallback = 0;
 
     ioc = QIO_CHANNEL(sioc);
     qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
@@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
 
         /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
-        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
-            sioc->new_zero_copy_sent_success = true;
+        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
+            sioc->zero_copy_fallback++;
         }
     }
 
@@ -900,12 +900,7 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
         return ret;
     }
 
-    if (sioc->new_zero_copy_sent_success) {
-        sioc->new_zero_copy_sent_success = false;
-        return 0;
-    }
-
-    return 1;
+    return sioc->zero_copy_fallback;
 }
 
 #endif /* QEMU_MSG_ZEROCOPY */
diff --git a/migration/multifd.c b/migration/multifd.c
index ad6261688f..726e40903d 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -593,7 +593,7 @@ void multifd_send_shutdown(void)
 
 static int multifd_zero_copy_flush(QIOChannel *c)
 {
-    int ret;
+    ssize_t ret;
     Error *err = NULL;
 
     ret = qio_channel_flush(c, &err);
@@ -601,8 +601,10 @@ static int multifd_zero_copy_flush(QIOChannel *c)
         error_report_err(err);
         return -1;
     }
-    if (ret == 1) {
-        qatomic_add(&mig_stats.dirty_sync_missed_zero_copy, 1);
+
+    if (qatomic_read(&mig_stats.dirty_sync_missed_zero_copy) != (uint64_t)ret) {
+        /* Update statistics if more fallback detected */
+        qatomic_set(&mig_stats.dirty_sync_missed_zero_copy, (uint64_t)ret);
     }
 
     return ret;
diff --git a/qapi/migration.json b/qapi/migration.json
index f925e5541b..94977b8810 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -58,8 +58,7 @@
 #     (since 7.0).
 #
 # @dirty-sync-missed-zero-copy: Number of times dirty RAM
-#     synchronization could not avoid copying dirty pages.  This is
-#     between 0 and @dirty-sync-count * @multifd-channels.
+#     synchronization could not avoid copying dirty pages.
 #     (since 7.1)
 #
 # Since: 0.14
-- 
2.43.7



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-09  9:09 [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate Tejus GK
@ 2026-03-09 16:48 ` Daniel P. Berrangé
  2026-03-09 16:59   ` Peter Xu
  2026-03-10  8:52 ` Markus Armbruster
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2026-03-09 16:48 UTC (permalink / raw)
  To: Tejus GK; +Cc: qemu-devel, Peter Xu, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On Mon, Mar 09, 2026 at 09:09:01AM +0000, Tejus GK wrote:
> Currently, the dirty-sync-missed-zero-copy stat is incremented only when
> an entire batch of IO operations fails to use zerocopy and falls back to
> a normal copy. As long as at least one IO in the batch is successfully
> zero-copied, the whole batch is treated as a success. This hides
> individual IO fallbacks and makes the migration stat less accurate than
> it could be.
> 
> Make the stat more accurate by reporting at a finer granularity, i.e, by
> incrementing for every individual IO fallback that occurs.
> 
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
> ---
>  include/io/channel-socket.h |  6 +-----
>  include/io/channel.h        |  5 ++---
>  io/channel-socket.c         | 13 ++++---------
>  migration/multifd.c         |  8 +++++---
>  qapi/migration.json         |  3 +--
>  5 files changed, 13 insertions(+), 22 deletions(-)
> 
> diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> index a1ef3136ea..5c5714668e 100644
> --- a/include/io/channel-socket.h
> +++ b/include/io/channel-socket.h
> @@ -50,11 +50,7 @@ struct QIOChannelSocket {
>      ssize_t zero_copy_queued;
>      ssize_t zero_copy_sent;
>      bool blocking;
> -    /**
> -     * This flag indicates whether any new data was successfully sent with
> -     * zerocopy since the last qio_channel_socket_flush() call.
> -     */
> -    bool new_zero_copy_sent_success;
> +    ssize_t zero_copy_fallback;
>  };
>  
>  
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 1b02350437..70f701ae16 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -1013,9 +1013,8 @@ int coroutine_mixed_fn qio_channel_writev_full_all(QIOChannel *ioc,
>   *
>   * If not implemented, acts as a no-op, and returns 0.
>   *
> - * Returns -1 if any error is found,
> - *          1 if every send failed to use zero copy.
> - *          0 otherwise.
> + * Returns -1 if any error is found, otherwise returns a non-negative number
> + * indicating the number of IO requests that fell back to copy.

IIUC that is *not* what the code is actually counting though.....

>   */
>  
>  int qio_channel_flush(QIOChannel *ioc,
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index 3053b35ad8..08b9862074 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -72,7 +72,7 @@ qio_channel_socket_new(void)
>      sioc->zero_copy_queued = 0;
>      sioc->zero_copy_sent = 0;
>      sioc->blocking = false;
> -    sioc->new_zero_copy_sent_success = false;
> +    sioc->zero_copy_fallback = 0;
>  
>      ioc = QIO_CHANNEL(sioc);
>      qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
>          sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
>  
>          /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> -            sioc->new_zero_copy_sent_success = true;
> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> +            sioc->zero_copy_fallback++;

...this is counting the number of MSG_ERRQUEUE items, which is not
the same as the number of IO requests. That's why we only used it
as a boolean marker originally, rather than making it a counter.

>          }
>      }
>  
> @@ -900,12 +900,7 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
>          return ret;
>      }
>  
> -    if (sioc->new_zero_copy_sent_success) {
> -        sioc->new_zero_copy_sent_success = false;
> -        return 0;
> -    }
> -
> -    return 1;
> +    return sioc->zero_copy_fallback;
>  }
>  
>  #endif /* QEMU_MSG_ZEROCOPY */
> diff --git a/migration/multifd.c b/migration/multifd.c
> index ad6261688f..726e40903d 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -593,7 +593,7 @@ void multifd_send_shutdown(void)
>  
>  static int multifd_zero_copy_flush(QIOChannel *c)
>  {
> -    int ret;
> +    ssize_t ret;
>      Error *err = NULL;
>  
>      ret = qio_channel_flush(c, &err);
> @@ -601,8 +601,10 @@ static int multifd_zero_copy_flush(QIOChannel *c)
>          error_report_err(err);
>          return -1;
>      }
> -    if (ret == 1) {
> -        qatomic_add(&mig_stats.dirty_sync_missed_zero_copy, 1);
> +
> +    if (qatomic_read(&mig_stats.dirty_sync_missed_zero_copy) != (uint64_t)ret) {
> +        /* Update statistics if more fallback detected */
> +        qatomic_set(&mig_stats.dirty_sync_missed_zero_copy, (uint64_t)ret);
>      }
>  
>      return ret;
> diff --git a/qapi/migration.json b/qapi/migration.json
> index f925e5541b..94977b8810 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -58,8 +58,7 @@
>  #     (since 7.0).
>  #
>  # @dirty-sync-missed-zero-copy: Number of times dirty RAM
> -#     synchronization could not avoid copying dirty pages.  This is
> -#     between 0 and @dirty-sync-count * @multifd-channels.
> +#     synchronization could not avoid copying dirty pages.
>  #     (since 7.1)
>  #
>  # Since: 0.14
> -- 
> 2.43.7
> 

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-09 16:48 ` Daniel P. Berrangé
@ 2026-03-09 16:59   ` Peter Xu
  2026-03-09 17:17     ` Daniel P. Berrangé
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2026-03-09 16:59 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Tejus GK, qemu-devel, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> >          sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> >  
> >          /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> > -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > -            sioc->new_zero_copy_sent_success = true;
> > +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > +            sioc->zero_copy_fallback++;
> 
> ...this is counting the number of MSG_ERRQUEUE items, which is not
> the same as the number of IO requests. That's why we only used it
> as a boolean marker originally, rather than making it a counter.

Would the logic still work and better than before?  Say, it's a counter of
"messages" rather than "IOs" then.

The problem with the old code was we may report fallback=0 even if there
can have fallback happened, as we mask that fact as long as one zerocopy
happened in the whole batch between two flushes.  So it seems this (even if
the counter is not per-IO) is still better.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-09 16:59   ` Peter Xu
@ 2026-03-09 17:17     ` Daniel P. Berrangé
  2026-03-09 17:42       ` Tejus GK
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2026-03-09 17:17 UTC (permalink / raw)
  To: Peter Xu; +Cc: Tejus GK, qemu-devel, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > > @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> > >          sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > >  
> > >          /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> > > -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > > -            sioc->new_zero_copy_sent_success = true;
> > > +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > > +            sioc->zero_copy_fallback++;
> > 
> > ...this is counting the number of MSG_ERRQUEUE items, which is not
> > the same as the number of IO requests. That's why we only used it
> > as a boolean marker originally, rather than making it a counter.
> 
> Would the logic still work and better than before?  Say, it's a counter of
> "messages" rather than "IOs" then.

IIUC it is a counter of processing notifications which is not directly
correlated to any action by QEMU - neither bytes nor syscalls.

> The problem with the old code was we may report fallback=0 even if there
> can have fallback happened, as we mask that fact as long as one zerocopy
> happened in the whole batch between two flushes.  So it seems this (even if
> the counter is not per-IO) is still better.

Better for what purpose though ?

If we enabled zero-copy, it is useful to know if /something/ managed
to benefit from zero-copy. ie if /always/ fails to zero-copy then
we can diagnose that the NIC driver isn't capable of it, or there
is some other limitation.  If something manages to zero-copy, then
we know the feature is functionally working.

What will we do with a count of notificaitons ?

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-09 17:17     ` Daniel P. Berrangé
@ 2026-03-09 17:42       ` Tejus GK
  2026-03-09 17:51         ` Daniel P. Berrangé
  0 siblings, 1 reply; 15+ messages in thread
From: Tejus GK @ 2026-03-09 17:42 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Xu, qemu-devel@nongnu.org, Fabiano Rosas, Eric Blake,
	Markus Armbruster



> On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
>> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
>>>> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
>>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
>>>> 
>>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
>>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
>>>> -            sioc->new_zero_copy_sent_success = true;
>>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
>>>> +            sioc->zero_copy_fallback++;
>>> 
>>> ...this is counting the number of MSG_ERRQUEUE items, which is not
>>> the same as the number of IO requests. That's why we only used it
>>> as a boolean marker originally, rather than making it a counter.
>> 
>> Would the logic still work and better than before?  Say, it's a counter of
>> "messages" rather than "IOs" then.
> 
> IIUC it is a counter of processing notifications which is not directly
> correlated to any action by QEMU - neither bytes nor syscalls.

Please correct me if I'm wrong about this, isn’t each notification an information 
about what happened to an individual IO?
> 
>> The problem with the old code was we may report fallback=0 even if there
>> can have fallback happened, as we mask that fact as long as one zerocopy
>> happened in the whole batch between two flushes.  So it seems this (even if
>> the counter is not per-IO) is still better.
> 
> Better for what purpose though ?
> 
> If we enabled zero-copy, it is useful to know if /something/ managed
> to benefit from zero-copy. ie if /always/ fails to zero-copy then
> we can diagnose that the NIC driver isn't capable of it, or there
> is some other limitation.  If something manages to zero-copy, then
> we know the feature is functionally working.
> 
> What will we do with a count of notificaitons ?

I was wondering if it can be useful for debugging live migration issues where zerocopy is 
enabled. For instance, let’s say a zerocopy write failed due to the socket error queue being
full. Now it could be either due to the out of order processing we had seen before 
(https://github.com/qemu/qemu/commit/84005f4a2b8745e5934f955c045a0b4311cd0992) or 
due to it getting filled up because some copies getting deferred. For the latter, this stat can be
worthwile as a debugging stat. 

Regards, 
Tejus



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-09 17:42       ` Tejus GK
@ 2026-03-09 17:51         ` Daniel P. Berrangé
  2026-03-09 18:21           ` Peter Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2026-03-09 17:51 UTC (permalink / raw)
  To: Tejus GK
  Cc: Peter Xu, qemu-devel@nongnu.org, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
> 
> 
> > On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > 
> > !-------------------------------------------------------------------|
> >  CAUTION: External Email
> > 
> > |-------------------------------------------------------------------!
> > 
> > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> >>>> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> >>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> >>>> 
> >>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> >>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> >>>> -            sioc->new_zero_copy_sent_success = true;
> >>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> >>>> +            sioc->zero_copy_fallback++;
> >>> 
> >>> ...this is counting the number of MSG_ERRQUEUE items, which is not
> >>> the same as the number of IO requests. That's why we only used it
> >>> as a boolean marker originally, rather than making it a counter.
> >> 
> >> Would the logic still work and better than before?  Say, it's a counter of
> >> "messages" rather than "IOs" then.
> > 
> > IIUC it is a counter of processing notifications which is not directly
> > correlated to any action by QEMU - neither bytes nor syscalls.
> 
> Please correct me if I'm wrong about this, isn’t each notification an information 
> about what happened to an individual IO?

If userspace hasn't read a queued notification yet, the kernel will
merge new notifications with the existing queued one.

The line above your change

  serr->ee_data - serr->ee_info + 1;

records how many notifications were merged, so we now how many
syscalls were processed.

If ee_code is  SO_EE_CODE_ZEROCOPY_COPIED though it means at least
one syscall resulted in a copy, but that doesn't imply that *all*
syscalls resulted in a copy.

AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
or it could be 1000 out of 1000 resulted in a copy. We don't know.

IIUC the kernel's merging of notifications appears lossy wrt this
information. It could be partially mitigated by doing a flush for
notifications really really frequently but that feels like it would
have its own downsides


> >> The problem with the old code was we may report fallback=0 even if there
> >> can have fallback happened, as we mask that fact as long as one zerocopy
> >> happened in the whole batch between two flushes.  So it seems this (even if
> >> the counter is not per-IO) is still better.
> > 
> > Better for what purpose though ?
> > 
> > If we enabled zero-copy, it is useful to know if /something/ managed
> > to benefit from zero-copy. ie if /always/ fails to zero-copy then
> > we can diagnose that the NIC driver isn't capable of it, or there
> > is some other limitation.  If something manages to zero-copy, then
> > we know the feature is functionally working.
> > 
> > What will we do with a count of notificaitons ?
> 
> I was wondering if it can be useful for debugging live migration issues where zerocopy is 
> enabled. For instance, let’s say a zerocopy write failed due to the socket error queue being
> full. Now it could be either due to the out of order processing we had seen before 
> (https://github.com/qemu/qemu/commit/84005f4a2b8745e5934f955c045a0b4311cd0992) or 
> due to it getting filled up because some copies getting deferred. For the latter, this stat can be
> worthwile as a debugging stat. 



With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-09 17:51         ` Daniel P. Berrangé
@ 2026-03-09 18:21           ` Peter Xu
  2026-03-11 12:02             ` Daniel P. Berrangé
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2026-03-09 18:21 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Tejus GK, qemu-devel@nongnu.org, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On Mon, Mar 09, 2026 at 05:51:29PM +0000, Daniel P. Berrangé wrote:
> On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
> > 
> > 
> > > On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > 
> > > !-------------------------------------------------------------------|
> > >  CAUTION: External Email
> > > 
> > > |-------------------------------------------------------------------!
> > > 
> > > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> > >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > >>>> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> > >>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > >>>> 
> > >>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> > >>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > >>>> -            sioc->new_zero_copy_sent_success = true;
> > >>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > >>>> +            sioc->zero_copy_fallback++;
> > >>> 
> > >>> ...this is counting the number of MSG_ERRQUEUE items, which is not
> > >>> the same as the number of IO requests. That's why we only used it
> > >>> as a boolean marker originally, rather than making it a counter.
> > >> 
> > >> Would the logic still work and better than before?  Say, it's a counter of
> > >> "messages" rather than "IOs" then.
> > > 
> > > IIUC it is a counter of processing notifications which is not directly
> > > correlated to any action by QEMU - neither bytes nor syscalls.
> > 
> > Please correct me if I'm wrong about this, isn’t each notification an information 
> > about what happened to an individual IO?
> 
> If userspace hasn't read a queued notification yet, the kernel will
> merge new notifications with the existing queued one.
> 
> The line above your change
> 
>   serr->ee_data - serr->ee_info + 1;
> 
> records how many notifications were merged, so we now how many
> syscalls were processed.
> 
> If ee_code is  SO_EE_CODE_ZEROCOPY_COPIED though it means at least
> one syscall resulted in a copy, but that doesn't imply that *all*
> syscalls resulted in a copy.
> 
> AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
> or it could be 1000 out of 1000 resulted in a copy. We don't know.
> 
> IIUC the kernel's merging of notifications appears lossy wrt this
> information. It could be partially mitigated by doing a flush for
> notifications really really frequently but that feels like it would
> have its own downsides

IMHO what this change does is removing the false negatives.

Before this patch, if QEMU reports fallback=0, it doesn't mean all the
MSG_ZEROCOPY requests were all fulfilled by zerocopy.  It's because we
justify it with one boolean over "a period of time" between two flushes, we
set the boolean to TRUE as long as there is _one_ successful report of
MSG_ZEROCOPY.  So even if every flush reports TRUE it only means "there is
at least one MSG_ZEROCOPY request that didn't fallback".  It has no
implication of whether a fallback happened.

Hence, before this v2 patch, there can be false negative reported by QEMU,
assuming there's no fallback (reflected in stats) but it actually happened.

After this patch, if QEMU reports fallback=0, it guarantees that _all_
MSG_ZEROCOPY requests are fulfilled with zerocopy.  It's because we monitor
all messages and accumulate any fallback cases.  Even if the messages can
be merged, when "fallback" shows anything non-zero would imply some
fallback happened.  Here, the counter value doesn't really matter much
IMHO, as long as it becomes non-zero.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-09  9:09 [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate Tejus GK
  2026-03-09 16:48 ` Daniel P. Berrangé
@ 2026-03-10  8:52 ` Markus Armbruster
  1 sibling, 0 replies; 15+ messages in thread
From: Markus Armbruster @ 2026-03-10  8:52 UTC (permalink / raw)
  To: Tejus GK
  Cc: qemu-devel, Daniel P. Berrangé, Peter Xu, Fabiano Rosas,
	Eric Blake

Tejus GK <tejus.gk@nutanix.com> writes:

> Currently, the dirty-sync-missed-zero-copy stat is incremented only when
> an entire batch of IO operations fails to use zerocopy and falls back to
> a normal copy. As long as at least one IO in the batch is successfully
> zero-copied, the whole batch is treated as a success. This hides
> individual IO fallbacks and makes the migration stat less accurate than
> it could be.
>
> Make the stat more accurate by reporting at a finer granularity, i.e, by
> incrementing for every individual IO fallback that occurs.
>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Tejus GK <tejus.gk@nutanix.com>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index f925e5541b..94977b8810 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -58,8 +58,7 @@
>  #     (since 7.0).
>  #
>  # @dirty-sync-missed-zero-copy: Number of times dirty RAM
> -#     synchronization could not avoid copying dirty pages.  This is
> -#     between 0 and @dirty-sync-count * @multifd-channels.
> +#     synchronization could not avoid copying dirty pages.
>  #     (since 7.1)
>  #
>  # Since: 0.14

Dropping the documented upper limit might make the value harder to
interpret.  Thus my question: how is this value to be used?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-09 18:21           ` Peter Xu
@ 2026-03-11 12:02             ` Daniel P. Berrangé
  2026-03-11 15:30               ` Peter Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2026-03-11 12:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tejus GK, qemu-devel@nongnu.org, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On Mon, Mar 09, 2026 at 02:21:49PM -0400, Peter Xu wrote:
> On Mon, Mar 09, 2026 at 05:51:29PM +0000, Daniel P. Berrangé wrote:
> > On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
> > > 
> > > 
> > > > On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > 
> > > > !-------------------------------------------------------------------|
> > > >  CAUTION: External Email
> > > > 
> > > > |-------------------------------------------------------------------!
> > > > 
> > > > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> > > >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > > >>>> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> > > >>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > > >>>> 
> > > >>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> > > >>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > > >>>> -            sioc->new_zero_copy_sent_success = true;
> > > >>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > > >>>> +            sioc->zero_copy_fallback++;
> > > >>> 
> > > >>> ...this is counting the number of MSG_ERRQUEUE items, which is not
> > > >>> the same as the number of IO requests. That's why we only used it
> > > >>> as a boolean marker originally, rather than making it a counter.
> > > >> 
> > > >> Would the logic still work and better than before?  Say, it's a counter of
> > > >> "messages" rather than "IOs" then.
> > > > 
> > > > IIUC it is a counter of processing notifications which is not directly
> > > > correlated to any action by QEMU - neither bytes nor syscalls.
> > > 
> > > Please correct me if I'm wrong about this, isn’t each notification an information 
> > > about what happened to an individual IO?
> > 
> > If userspace hasn't read a queued notification yet, the kernel will
> > merge new notifications with the existing queued one.
> > 
> > The line above your change
> > 
> >   serr->ee_data - serr->ee_info + 1;
> > 
> > records how many notifications were merged, so we now how many
> > syscalls were processed.
> > 
> > If ee_code is  SO_EE_CODE_ZEROCOPY_COPIED though it means at least
> > one syscall resulted in a copy, but that doesn't imply that *all*
> > syscalls resulted in a copy.
> > 
> > AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
> > or it could be 1000 out of 1000 resulted in a copy. We don't know.
> > 
> > IIUC the kernel's merging of notifications appears lossy wrt this
> > information. It could be partially mitigated by doing a flush for
> > notifications really really frequently but that feels like it would
> > have its own downsides
> 
> IMHO what this change does is removing the false negatives.
> 
> Before this patch, if QEMU reports fallback=0, it doesn't mean all the
> MSG_ZEROCOPY requests were all fulfilled by zerocopy.  It's because we
> justify it with one boolean over "a period of time" between two flushes, we
> set the boolean to TRUE as long as there is _one_ successful report of
> MSG_ZEROCOPY.  So even if every flush reports TRUE it only means "there is
> at least one MSG_ZEROCOPY request that didn't fallback".  It has no
> implication of whether a fallback happened.
> 
> Hence, before this v2 patch, there can be false negative reported by QEMU,
> assuming there's no fallback (reflected in stats) but it actually happened.
> 
> After this patch, if QEMU reports fallback=0, it guarantees that _all_
> MSG_ZEROCOPY requests are fulfilled with zerocopy.  It's because we monitor
> all messages and accumulate any fallback cases.  Even if the messages can
> be merged, when "fallback" shows anything non-zero would imply some
> fallback happened.  Here, the counter value doesn't really matter much
> IMHO, as long as it becomes non-zero.

AFAICT, the v1 of this patch was sufficient to address the original
bug and maintain the current intended semantics of the migration
counter. This v2 is mixing a bug fix with functional change in
behaviour and I don't think the latter is justified.

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-11 12:02             ` Daniel P. Berrangé
@ 2026-03-11 15:30               ` Peter Xu
  2026-03-11 16:56                 ` Daniel P. Berrangé
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2026-03-11 15:30 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Tejus GK, qemu-devel@nongnu.org, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On Wed, Mar 11, 2026 at 12:02:05PM +0000, Daniel P. Berrangé wrote:
> On Mon, Mar 09, 2026 at 02:21:49PM -0400, Peter Xu wrote:
> > On Mon, Mar 09, 2026 at 05:51:29PM +0000, Daniel P. Berrangé wrote:
> > > On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
> > > > 
> > > > 
> > > > > On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > 
> > > > > !-------------------------------------------------------------------|
> > > > >  CAUTION: External Email
> > > > > 
> > > > > |-------------------------------------------------------------------!
> > > > > 
> > > > > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> > > > >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > > > >>>> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> > > > >>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > > > >>>> 
> > > > >>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> > > > >>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > > > >>>> -            sioc->new_zero_copy_sent_success = true;
> > > > >>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > > > >>>> +            sioc->zero_copy_fallback++;
> > > > >>> 
> > > > >>> ...this is counting the number of MSG_ERRQUEUE items, which is not
> > > > >>> the same as the number of IO requests. That's why we only used it
> > > > >>> as a boolean marker originally, rather than making it a counter.
> > > > >> 
> > > > >> Would the logic still work and better than before?  Say, it's a counter of
> > > > >> "messages" rather than "IOs" then.
> > > > > 
> > > > > IIUC it is a counter of processing notifications which is not directly
> > > > > correlated to any action by QEMU - neither bytes nor syscalls.
> > > > 
> > > > Please correct me if I'm wrong about this, isn’t each notification an information 
> > > > about what happened to an individual IO?
> > > 
> > > If userspace hasn't read a queued notification yet, the kernel will
> > > merge new notifications with the existing queued one.
> > > 
> > > The line above your change
> > > 
> > >   serr->ee_data - serr->ee_info + 1;
> > > 
> > > records how many notifications were merged, so we now how many
> > > syscalls were processed.
> > > 
> > > If ee_code is  SO_EE_CODE_ZEROCOPY_COPIED though it means at least
> > > one syscall resulted in a copy, but that doesn't imply that *all*
> > > syscalls resulted in a copy.
> > > 
> > > AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
> > > or it could be 1000 out of 1000 resulted in a copy. We don't know.
> > > 
> > > IIUC the kernel's merging of notifications appears lossy wrt this
> > > information. It could be partially mitigated by doing a flush for
> > > notifications really really frequently but that feels like it would
> > > have its own downsides
> > 
> > IMHO what this change does is removing the false negatives.
> > 
> > Before this patch, if QEMU reports fallback=0, it doesn't mean all the
> > MSG_ZEROCOPY requests were all fulfilled by zerocopy.  It's because we
> > justify it with one boolean over "a period of time" between two flushes, we
> > set the boolean to TRUE as long as there is _one_ successful report of
> > MSG_ZEROCOPY.  So even if every flush reports TRUE it only means "there is
> > at least one MSG_ZEROCOPY request that didn't fallback".  It has no
> > implication of whether a fallback happened.
> > 
> > Hence, before this v2 patch, there can be false negative reported by QEMU,
> > assuming there's no fallback (reflected in stats) but it actually happened.
> > 
> > After this patch, if QEMU reports fallback=0, it guarantees that _all_
> > MSG_ZEROCOPY requests are fulfilled with zerocopy.  It's because we monitor
> > all messages and accumulate any fallback cases.  Even if the messages can
> > be merged, when "fallback" shows anything non-zero would imply some
> > fallback happened.  Here, the counter value doesn't really matter much
> > IMHO, as long as it becomes non-zero.
> 
> AFAICT, the v1 of this patch was sufficient to address the original
> bug and maintain the current intended semantics of the migration
> counter. This v2 is mixing a bug fix with functional change in
> behaviour and I don't think the latter is justified.

It's just that when it cannot report all fallback cases, I don't yet see
how it would help much even if we fix the previous behavior with v1..

OTOH, the new behavior will be deemed to have no issue on the problem v1
was fixing.

So IIUC v2's behavior is the one we want, and helps identify fallback
happened.

If to split the patch, we can merge v1, then change the behavior like v2,
but then we will need to revert v1 again because it's not needed with the
new behavior.  I'm OK either way.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-11 15:30               ` Peter Xu
@ 2026-03-11 16:56                 ` Daniel P. Berrangé
  2026-03-11 17:28                   ` Peter Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2026-03-11 16:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tejus GK, qemu-devel@nongnu.org, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On Wed, Mar 11, 2026 at 11:30:26AM -0400, Peter Xu wrote:
> On Wed, Mar 11, 2026 at 12:02:05PM +0000, Daniel P. Berrangé wrote:
> > On Mon, Mar 09, 2026 at 02:21:49PM -0400, Peter Xu wrote:
> > > On Mon, Mar 09, 2026 at 05:51:29PM +0000, Daniel P. Berrangé wrote:
> > > > On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
> > > > > 
> > > > > 
> > > > > > On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > > 
> > > > > > !-------------------------------------------------------------------|
> > > > > >  CAUTION: External Email
> > > > > > 
> > > > > > |-------------------------------------------------------------------!
> > > > > > 
> > > > > > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> > > > > >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > > > > >>>> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> > > > > >>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > > > > >>>> 
> > > > > >>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> > > > > >>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > > > > >>>> -            sioc->new_zero_copy_sent_success = true;
> > > > > >>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > > > > >>>> +            sioc->zero_copy_fallback++;
> > > > > >>> 
> > > > > >>> ...this is counting the number of MSG_ERRQUEUE items, which is not
> > > > > >>> the same as the number of IO requests. That's why we only used it
> > > > > >>> as a boolean marker originally, rather than making it a counter.
> > > > > >> 
> > > > > >> Would the logic still work and better than before?  Say, it's a counter of
> > > > > >> "messages" rather than "IOs" then.
> > > > > > 
> > > > > > IIUC it is a counter of processing notifications which is not directly
> > > > > > correlated to any action by QEMU - neither bytes nor syscalls.
> > > > > 
> > > > > Please correct me if I'm wrong about this, isn’t each notification an information 
> > > > > about what happened to an individual IO?
> > > > 
> > > > If userspace hasn't read a queued notification yet, the kernel will
> > > > merge new notifications with the existing queued one.
> > > > 
> > > > The line above your change
> > > > 
> > > >   serr->ee_data - serr->ee_info + 1;
> > > > 
> > > > records how many notifications were merged, so we now how many
> > > > syscalls were processed.
> > > > 
> > > > If ee_code is  SO_EE_CODE_ZEROCOPY_COPIED though it means at least
> > > > one syscall resulted in a copy, but that doesn't imply that *all*
> > > > syscalls resulted in a copy.
> > > > 
> > > > AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
> > > > or it could be 1000 out of 1000 resulted in a copy. We don't know.
> > > > 
> > > > IIUC the kernel's merging of notifications appears lossy wrt this
> > > > information. It could be partially mitigated by doing a flush for
> > > > notifications really really frequently but that feels like it would
> > > > have its own downsides
> > > 
> > > IMHO what this change does is removing the false negatives.
> > > 
> > > Before this patch, if QEMU reports fallback=0, it doesn't mean all the
> > > MSG_ZEROCOPY requests were all fulfilled by zerocopy.  It's because we
> > > justify it with one boolean over "a period of time" between two flushes, we
> > > set the boolean to TRUE as long as there is _one_ successful report of
> > > MSG_ZEROCOPY.  So even if every flush reports TRUE it only means "there is
> > > at least one MSG_ZEROCOPY request that didn't fallback".  It has no
> > > implication of whether a fallback happened.
> > > 
> > > Hence, before this v2 patch, there can be false negative reported by QEMU,
> > > assuming there's no fallback (reflected in stats) but it actually happened.
> > > 
> > > After this patch, if QEMU reports fallback=0, it guarantees that _all_
> > > MSG_ZEROCOPY requests are fulfilled with zerocopy.  It's because we monitor
> > > all messages and accumulate any fallback cases.  Even if the messages can
> > > be merged, when "fallback" shows anything non-zero would imply some
> > > fallback happened.  Here, the counter value doesn't really matter much
> > > IMHO, as long as it becomes non-zero.
> > 
> > AFAICT, the v1 of this patch was sufficient to address the original
> > bug and maintain the current intended semantics of the migration
> > counter. This v2 is mixing a bug fix with functional change in
> > behaviour and I don't think the latter is justified.
> 
> It's just that when it cannot report all fallback cases, I don't yet see
> how it would help much even if we fix the previous behavior with v1..
> 
> OTOH, the new behavior will be deemed to have no issue on the problem v1
> was fixing.
> 
> So IIUC v2's behavior is the one we want, and helps identify fallback
> happened.

I don't consider v2 acceptable as the value its returning is an
meaningless counter that doesn't correlate to any quantity that
is used by QEMU, nor visible to users of QEMU.

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-11 16:56                 ` Daniel P. Berrangé
@ 2026-03-11 17:28                   ` Peter Xu
  2026-03-11 17:46                     ` Daniel P. Berrangé
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2026-03-11 17:28 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Tejus GK, qemu-devel@nongnu.org, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On Wed, Mar 11, 2026 at 04:56:17PM +0000, Daniel P. Berrangé wrote:
> On Wed, Mar 11, 2026 at 11:30:26AM -0400, Peter Xu wrote:
> > On Wed, Mar 11, 2026 at 12:02:05PM +0000, Daniel P. Berrangé wrote:
> > > On Mon, Mar 09, 2026 at 02:21:49PM -0400, Peter Xu wrote:
> > > > On Mon, Mar 09, 2026 at 05:51:29PM +0000, Daniel P. Berrangé wrote:
> > > > > On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
> > > > > > 
> > > > > > 
> > > > > > > On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > > > 
> > > > > > > !-------------------------------------------------------------------|
> > > > > > >  CAUTION: External Email
> > > > > > > 
> > > > > > > |-------------------------------------------------------------------!
> > > > > > > 
> > > > > > > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> > > > > > >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > > > > > >>>> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> > > > > > >>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > > > > > >>>> 
> > > > > > >>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> > > > > > >>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > > > > > >>>> -            sioc->new_zero_copy_sent_success = true;
> > > > > > >>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > > > > > >>>> +            sioc->zero_copy_fallback++;
> > > > > > >>> 
> > > > > > >>> ...this is counting the number of MSG_ERRQUEUE items, which is not
> > > > > > >>> the same as the number of IO requests. That's why we only used it
> > > > > > >>> as a boolean marker originally, rather than making it a counter.
> > > > > > >> 
> > > > > > >> Would the logic still work and better than before?  Say, it's a counter of
> > > > > > >> "messages" rather than "IOs" then.
> > > > > > > 
> > > > > > > IIUC it is a counter of processing notifications which is not directly
> > > > > > > correlated to any action by QEMU - neither bytes nor syscalls.
> > > > > > 
> > > > > > Please correct me if I'm wrong about this, isn’t each notification an information 
> > > > > > about what happened to an individual IO?
> > > > > 
> > > > > If userspace hasn't read a queued notification yet, the kernel will
> > > > > merge new notifications with the existing queued one.
> > > > > 
> > > > > The line above your change
> > > > > 
> > > > >   serr->ee_data - serr->ee_info + 1;
> > > > > 
> > > > > records how many notifications were merged, so we now how many
> > > > > syscalls were processed.
> > > > > 
> > > > > If ee_code is  SO_EE_CODE_ZEROCOPY_COPIED though it means at least
> > > > > one syscall resulted in a copy, but that doesn't imply that *all*
> > > > > syscalls resulted in a copy.
> > > > > 
> > > > > AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
> > > > > or it could be 1000 out of 1000 resulted in a copy. We don't know.
> > > > > 
> > > > > IIUC the kernel's merging of notifications appears lossy wrt this
> > > > > information. It could be partially mitigated by doing a flush for
> > > > > notifications really really frequently but that feels like it would
> > > > > have its own downsides
> > > > 
> > > > IMHO what this change does is removing the false negatives.
> > > > 
> > > > Before this patch, if QEMU reports fallback=0, it doesn't mean all the
> > > > MSG_ZEROCOPY requests were all fulfilled by zerocopy.  It's because we
> > > > justify it with one boolean over "a period of time" between two flushes, we
> > > > set the boolean to TRUE as long as there is _one_ successful report of
> > > > MSG_ZEROCOPY.  So even if every flush reports TRUE it only means "there is
> > > > at least one MSG_ZEROCOPY request that didn't fallback".  It has no
> > > > implication of whether a fallback happened.
> > > > 
> > > > Hence, before this v2 patch, there can be false negative reported by QEMU,
> > > > assuming there's no fallback (reflected in stats) but it actually happened.
> > > > 
> > > > After this patch, if QEMU reports fallback=0, it guarantees that _all_
> > > > MSG_ZEROCOPY requests are fulfilled with zerocopy.  It's because we monitor
> > > > all messages and accumulate any fallback cases.  Even if the messages can
> > > > be merged, when "fallback" shows anything non-zero would imply some
> > > > fallback happened.  Here, the counter value doesn't really matter much
> > > > IMHO, as long as it becomes non-zero.
> > > 
> > > AFAICT, the v1 of this patch was sufficient to address the original
> > > bug and maintain the current intended semantics of the migration
> > > counter. This v2 is mixing a bug fix with functional change in
> > > behaviour and I don't think the latter is justified.
> > 
> > It's just that when it cannot report all fallback cases, I don't yet see
> > how it would help much even if we fix the previous behavior with v1..
> > 
> > OTOH, the new behavior will be deemed to have no issue on the problem v1
> > was fixing.
> > 
> > So IIUC v2's behavior is the one we want, and helps identify fallback
> > happened.
> 
> I don't consider v2 acceptable as the value its returning is an
> meaningless counter that doesn't correlate to any quantity that
> is used by QEMU, nor visible to users of QEMU.

It can be a boolean if we want showing "if any fallback happened", that'll
at least make it accurate and avoid false negatives. But IMHO a counter is
always better, e.g. when we dump it from time to time we know if any more
fallback happened.

In that case, no matter how that counter is defined in granularity that'll
help, as long as it get boosted when fallback happened.

I also don't expect this value to be consumed by an user, but only reported
by an user and should only be consumed by a developer.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-11 17:28                   ` Peter Xu
@ 2026-03-11 17:46                     ` Daniel P. Berrangé
  2026-03-11 18:43                       ` Peter Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2026-03-11 17:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tejus GK, qemu-devel@nongnu.org, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On Wed, Mar 11, 2026 at 01:28:36PM -0400, Peter Xu wrote:
> On Wed, Mar 11, 2026 at 04:56:17PM +0000, Daniel P. Berrangé wrote:
> > On Wed, Mar 11, 2026 at 11:30:26AM -0400, Peter Xu wrote:
> > > On Wed, Mar 11, 2026 at 12:02:05PM +0000, Daniel P. Berrangé wrote:
> > > > On Mon, Mar 09, 2026 at 02:21:49PM -0400, Peter Xu wrote:
> > > > > On Mon, Mar 09, 2026 at 05:51:29PM +0000, Daniel P. Berrangé wrote:
> > > > > > On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > > On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > > > > 
> > > > > > > > !-------------------------------------------------------------------|
> > > > > > > >  CAUTION: External Email
> > > > > > > > 
> > > > > > > > |-------------------------------------------------------------------!
> > > > > > > > 
> > > > > > > > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> > > > > > > >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > > > > > > >>>> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> > > > > > > >>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > > > > > > >>>> 
> > > > > > > >>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> > > > > > > >>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > > > > > > >>>> -            sioc->new_zero_copy_sent_success = true;
> > > > > > > >>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > > > > > > >>>> +            sioc->zero_copy_fallback++;
> > > > > > > >>> 
> > > > > > > >>> ...this is counting the number of MSG_ERRQUEUE items, which is not
> > > > > > > >>> the same as the number of IO requests. That's why we only used it
> > > > > > > >>> as a boolean marker originally, rather than making it a counter.
> > > > > > > >> 
> > > > > > > >> Would the logic still work and better than before?  Say, it's a counter of
> > > > > > > >> "messages" rather than "IOs" then.
> > > > > > > > 
> > > > > > > > IIUC it is a counter of processing notifications which is not directly
> > > > > > > > correlated to any action by QEMU - neither bytes nor syscalls.
> > > > > > > 
> > > > > > > Please correct me if I'm wrong about this, isn’t each notification an information 
> > > > > > > about what happened to an individual IO?
> > > > > > 
> > > > > > If userspace hasn't read a queued notification yet, the kernel will
> > > > > > merge new notifications with the existing queued one.
> > > > > > 
> > > > > > The line above your change
> > > > > > 
> > > > > >   serr->ee_data - serr->ee_info + 1;
> > > > > > 
> > > > > > records how many notifications were merged, so we now how many
> > > > > > syscalls were processed.
> > > > > > 
> > > > > > If ee_code is  SO_EE_CODE_ZEROCOPY_COPIED though it means at least
> > > > > > one syscall resulted in a copy, but that doesn't imply that *all*
> > > > > > syscalls resulted in a copy.
> > > > > > 
> > > > > > AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
> > > > > > or it could be 1000 out of 1000 resulted in a copy. We don't know.
> > > > > > 
> > > > > > IIUC the kernel's merging of notifications appears lossy wrt this
> > > > > > information. It could be partially mitigated by doing a flush for
> > > > > > notifications really really frequently but that feels like it would
> > > > > > have its own downsides
> > > > > 
> > > > > IMHO what this change does is removing the false negatives.
> > > > > 
> > > > > Before this patch, if QEMU reports fallback=0, it doesn't mean all the
> > > > > MSG_ZEROCOPY requests were all fulfilled by zerocopy.  It's because we
> > > > > justify it with one boolean over "a period of time" between two flushes, we
> > > > > set the boolean to TRUE as long as there is _one_ successful report of
> > > > > MSG_ZEROCOPY.  So even if every flush reports TRUE it only means "there is
> > > > > at least one MSG_ZEROCOPY request that didn't fallback".  It has no
> > > > > implication of whether a fallback happened.
> > > > > 
> > > > > Hence, before this v2 patch, there can be false negative reported by QEMU,
> > > > > assuming there's no fallback (reflected in stats) but it actually happened.
> > > > > 
> > > > > After this patch, if QEMU reports fallback=0, it guarantees that _all_
> > > > > MSG_ZEROCOPY requests are fulfilled with zerocopy.  It's because we monitor
> > > > > all messages and accumulate any fallback cases.  Even if the messages can
> > > > > be merged, when "fallback" shows anything non-zero would imply some
> > > > > fallback happened.  Here, the counter value doesn't really matter much
> > > > > IMHO, as long as it becomes non-zero.
> > > > 
> > > > AFAICT, the v1 of this patch was sufficient to address the original
> > > > bug and maintain the current intended semantics of the migration
> > > > counter. This v2 is mixing a bug fix with functional change in
> > > > behaviour and I don't think the latter is justified.
> > > 
> > > It's just that when it cannot report all fallback cases, I don't yet see
> > > how it would help much even if we fix the previous behavior with v1..
> > > 
> > > OTOH, the new behavior will be deemed to have no issue on the problem v1
> > > was fixing.
> > > 
> > > So IIUC v2's behavior is the one we want, and helps identify fallback
> > > happened.
> > 
> > I don't consider v2 acceptable as the value its returning is an
> > meaningless counter that doesn't correlate to any quantity that
> > is used by QEMU, nor visible to users of QEMU.
> 
> It can be a boolean if we want showing "if any fallback happened", that'll
> at least make it accurate and avoid false negatives. But IMHO a counter is
> always better, e.g. when we dump it from time to time we know if any more
> fallback happened.
> 
> In that case, no matter how that counter is defined in granularity that'll
> help, as long as it get boosted when fallback happened.
> 
> I also don't expect this value to be consumed by an user, but only reported
> by an user and should only be consumed by a developer.

Ok, so the problem is that we've got a design inversion between what
the kernel is reporting and what the io channel is reporting.

With the kernel notifications we can determine

 * All syscalls successfully used zero copy
 * At least one syscall failed to use zero copy

whereas what the io channel flush is (claiming) to report is

 * 1 => all syscalls failed to use zero copy
 * 0 => at least one syscall successfully used zero copy

and you cannot infer the latter from the former, as we have missing
information due to merging of notifications.

So we need to invert the return values semantics of the flush method
to account for the missing information:

 * 1 => at least one syscall failed to use zero copy
 * 0 => all syscalls successfully used zero copy


With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-11 17:46                     ` Daniel P. Berrangé
@ 2026-03-11 18:43                       ` Peter Xu
  2026-03-16 16:26                         ` Tejus GK
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2026-03-11 18:43 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Tejus GK, qemu-devel@nongnu.org, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On Wed, Mar 11, 2026 at 05:46:56PM +0000, Daniel P. Berrangé wrote:
> On Wed, Mar 11, 2026 at 01:28:36PM -0400, Peter Xu wrote:
> > On Wed, Mar 11, 2026 at 04:56:17PM +0000, Daniel P. Berrangé wrote:
> > > On Wed, Mar 11, 2026 at 11:30:26AM -0400, Peter Xu wrote:
> > > > On Wed, Mar 11, 2026 at 12:02:05PM +0000, Daniel P. Berrangé wrote:
> > > > > On Mon, Mar 09, 2026 at 02:21:49PM -0400, Peter Xu wrote:
> > > > > > On Mon, Mar 09, 2026 at 05:51:29PM +0000, Daniel P. Berrangé wrote:
> > > > > > > On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > > > > > 
> > > > > > > > > !-------------------------------------------------------------------|
> > > > > > > > >  CAUTION: External Email
> > > > > > > > > 
> > > > > > > > > |-------------------------------------------------------------------!
> > > > > > > > > 
> > > > > > > > > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> > > > > > > > >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > > > > > > > >>>> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> > > > > > > > >>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > > > > > > > >>>> 
> > > > > > > > >>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> > > > > > > > >>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > > > > > > > >>>> -            sioc->new_zero_copy_sent_success = true;
> > > > > > > > >>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > > > > > > > >>>> +            sioc->zero_copy_fallback++;
> > > > > > > > >>> 
> > > > > > > > >>> ...this is counting the number of MSG_ERRQUEUE items, which is not
> > > > > > > > >>> the same as the number of IO requests. That's why we only used it
> > > > > > > > >>> as a boolean marker originally, rather than making it a counter.
> > > > > > > > >> 
> > > > > > > > >> Would the logic still work and better than before?  Say, it's a counter of
> > > > > > > > >> "messages" rather than "IOs" then.
> > > > > > > > > 
> > > > > > > > > IIUC it is a counter of processing notifications which is not directly
> > > > > > > > > correlated to any action by QEMU - neither bytes nor syscalls.
> > > > > > > > 
> > > > > > > > Please correct me if I'm wrong about this, isn’t each notification an information 
> > > > > > > > about what happened to an individual IO?
> > > > > > > 
> > > > > > > If userspace hasn't read a queued notification yet, the kernel will
> > > > > > > merge new notifications with the existing queued one.
> > > > > > > 
> > > > > > > The line above your change
> > > > > > > 
> > > > > > >   serr->ee_data - serr->ee_info + 1;
> > > > > > > 
> > > > > > > records how many notifications were merged, so we now how many
> > > > > > > syscalls were processed.
> > > > > > > 
> > > > > > > If ee_code is  SO_EE_CODE_ZEROCOPY_COPIED though it means at least
> > > > > > > one syscall resulted in a copy, but that doesn't imply that *all*
> > > > > > > syscalls resulted in a copy.
> > > > > > > 
> > > > > > > AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
> > > > > > > or it could be 1000 out of 1000 resulted in a copy. We don't know.
> > > > > > > 
> > > > > > > IIUC the kernel's merging of notifications appears lossy wrt this
> > > > > > > information. It could be partially mitigated by doing a flush for
> > > > > > > notifications really really frequently but that feels like it would
> > > > > > > have its own downsides
> > > > > > 
> > > > > > IMHO what this change does is removing the false negatives.
> > > > > > 
> > > > > > Before this patch, if QEMU reports fallback=0, it doesn't mean all the
> > > > > > MSG_ZEROCOPY requests were all fulfilled by zerocopy.  It's because we
> > > > > > justify it with one boolean over "a period of time" between two flushes, we
> > > > > > set the boolean to TRUE as long as there is _one_ successful report of
> > > > > > MSG_ZEROCOPY.  So even if every flush reports TRUE it only means "there is
> > > > > > at least one MSG_ZEROCOPY request that didn't fallback".  It has no
> > > > > > implication of whether a fallback happened.
> > > > > > 
> > > > > > Hence, before this v2 patch, there can be false negative reported by QEMU,
> > > > > > assuming there's no fallback (reflected in stats) but it actually happened.
> > > > > > 
> > > > > > After this patch, if QEMU reports fallback=0, it guarantees that _all_
> > > > > > MSG_ZEROCOPY requests are fulfilled with zerocopy.  It's because we monitor
> > > > > > all messages and accumulate any fallback cases.  Even if the messages can
> > > > > > be merged, when "fallback" shows anything non-zero would imply some
> > > > > > fallback happened.  Here, the counter value doesn't really matter much
> > > > > > IMHO, as long as it becomes non-zero.
> > > > > 
> > > > > AFAICT, the v1 of this patch was sufficient to address the original
> > > > > bug and maintain the current intended semantics of the migration
> > > > > counter. This v2 is mixing a bug fix with functional change in
> > > > > behaviour and I don't think the latter is justified.
> > > > 
> > > > It's just that when it cannot report all fallback cases, I don't yet see
> > > > how it would help much even if we fix the previous behavior with v1..
> > > > 
> > > > OTOH, the new behavior will be deemed to have no issue on the problem v1
> > > > was fixing.
> > > > 
> > > > So IIUC v2's behavior is the one we want, and helps identify fallback
> > > > happened.
> > > 
> > > I don't consider v2 acceptable as the value its returning is an
> > > meaningless counter that doesn't correlate to any quantity that
> > > is used by QEMU, nor visible to users of QEMU.
> > 
> > It can be a boolean if we want showing "if any fallback happened", that'll
> > at least make it accurate and avoid false negatives. But IMHO a counter is
> > always better, e.g. when we dump it from time to time we know if any more
> > fallback happened.
> > 
> > In that case, no matter how that counter is defined in granularity that'll
> > help, as long as it get boosted when fallback happened.
> > 
> > I also don't expect this value to be consumed by an user, but only reported
> > by an user and should only be consumed by a developer.
> 
> Ok, so the problem is that we've got a design inversion between what
> the kernel is reporting and what the io channel is reporting.
> 
> With the kernel notifications we can determine
> 
>  * All syscalls successfully used zero copy
>  * At least one syscall failed to use zero copy
> 
> whereas what the io channel flush is (claiming) to report is
> 
>  * 1 => all syscalls failed to use zero copy
>  * 0 => at least one syscall successfully used zero copy
> 
> and you cannot infer the latter from the former, as we have missing
> information due to merging of notifications.
> 
> So we need to invert the return values semantics of the flush method
> to account for the missing information:
> 
>  * 1 => at least one syscall failed to use zero copy
>  * 0 => all syscalls successfully used zero copy

Yep, this should be one good way to nail this problem.  Maybe Tejus, as a
real consumer of this counter, will have a preference on how it looks the
best.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
  2026-03-11 18:43                       ` Peter Xu
@ 2026-03-16 16:26                         ` Tejus GK
  0 siblings, 0 replies; 15+ messages in thread
From: Tejus GK @ 2026-03-16 16:26 UTC (permalink / raw)
  To: Peter Xu, Daniel P. Berrangé
  Cc: Tejus GK, qemu-devel@nongnu.org, Fabiano Rosas, Eric Blake,
	Markus Armbruster

On 12/03/26 12:13 am, Peter Xu wrote:
> !-------------------------------------------------------------------|
>   CAUTION: External Email
> |-------------------------------------------------------------------!
> On Wed, Mar 11, 2026 at 05:46:56PM +0000, Daniel P. Berrangé wrote:
>> On Wed, Mar 11, 2026 at 01:28:36PM -0400, Peter Xu wrote:
>>> On Wed, Mar 11, 2026 at 04:56:17PM +0000, Daniel P. Berrangé wrote:
>>>> On Wed, Mar 11, 2026 at 11:30:26AM -0400, Peter Xu wrote:
>>>>> On Wed, Mar 11, 2026 at 12:02:05PM +0000, Daniel P. Berrangé wrote:
>>>>>> On Mon, Mar 09, 2026 at 02:21:49PM -0400, Peter Xu wrote:
>>>>>>> On Mon, Mar 09, 2026 at 05:51:29PM +0000, Daniel P. Berrangé wrote:
>>>>>>>> On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> !-------------------------------------------------------------------|
>>>>>>>>>>  CAUTION: External Email
>>>>>>>>>> 
>>>>>>>>>> |-------------------------------------------------------------------!
>>>>>>>>>> 
>>>>>>>>>> On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
>>>>>>>>>>> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
>>>>>>>>>>>>> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
>>>>>>>>>>>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
>>>>>>>>>>>>> 
>>>>>>>>>>>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
>>>>>>>>>>>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
>>>>>>>>>>>>> -            sioc->new_zero_copy_sent_success = true;
>>>>>>>>>>>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
>>>>>>>>>>>>> +            sioc->zero_copy_fallback++;
>>>>>>>>>>>> 
>>>>>>>>>>>> ...this is counting the number of MSG_ERRQUEUE items, which is not
>>>>>>>>>>>> the same as the number of IO requests. That's why we only used it
>>>>>>>>>>>> as a boolean marker originally, rather than making it a counter.
>>>>>>>>>>> 
>>>>>>>>>>> Would the logic still work and better than before?  Say, it's a counter of
>>>>>>>>>>> "messages" rather than "IOs" then.
>>>>>>>>>> 
>>>>>>>>>> IIUC it is a counter of processing notifications which is not directly
>>>>>>>>>> correlated to any action by QEMU - neither bytes nor syscalls.
>>>>>>>>> 
>>>>>>>>> Please correct me if I'm wrong about this, isn’t each notification an information
>>>>>>>>> about what happened to an individual IO?
>>>>>>>> 
>>>>>>>> If userspace hasn't read a queued notification yet, the kernel will
>>>>>>>> merge new notifications with the existing queued one.
>>>>>>>> 
>>>>>>>> The line above your change
>>>>>>>> 
>>>>>>>>   serr->ee_data - serr->ee_info + 1;
>>>>>>>> 
>>>>>>>> records how many notifications were merged, so we now how many
>>>>>>>> syscalls were processed.
>>>>>>>> 
>>>>>>>> If ee_code is  SO_EE_CODE_ZEROCOPY_COPIED though it means at least
>>>>>>>> one syscall resulted in a copy, but that doesn't imply that *all*
>>>>>>>> syscalls resulted in a copy.
>>>>>>>> 
>>>>>>>> AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
>>>>>>>> or it could be 1000 out of 1000 resulted in a copy. We don't know.
>>>>>>>> 
>>>>>>>> IIUC the kernel's merging of notifications appears lossy wrt this
>>>>>>>> information. It could be partially mitigated by doing a flush for
>>>>>>>> notifications really really frequently but that feels like it would
>>>>>>>> have its own downsides
>>>>>>> 
>>>>>>> IMHO what this change does is removing the false negatives.
>>>>>>> 
>>>>>>> Before this patch, if QEMU reports fallback=0, it doesn't mean all the
>>>>>>> MSG_ZEROCOPY requests were all fulfilled by zerocopy.  It's because we
>>>>>>> justify it with one boolean over "a period of time" between two flushes, we
>>>>>>> set the boolean to TRUE as long as there is _one_ successful report of
>>>>>>> MSG_ZEROCOPY.  So even if every flush reports TRUE it only means "there is
>>>>>>> at least one MSG_ZEROCOPY request that didn't fallback".  It has no
>>>>>>> implication of whether a fallback happened.
>>>>>>> 
>>>>>>> Hence, before this v2 patch, there can be false negative reported by QEMU,
>>>>>>> assuming there's no fallback (reflected in stats) but it actually happened.
>>>>>>> 
>>>>>>> After this patch, if QEMU reports fallback=0, it guarantees that _all_
>>>>>>> MSG_ZEROCOPY requests are fulfilled with zerocopy.  It's because we monitor
>>>>>>> all messages and accumulate any fallback cases.  Even if the messages can
>>>>>>> be merged, when "fallback" shows anything non-zero would imply some
>>>>>>> fallback happened.  Here, the counter value doesn't really matter much
>>>>>>> IMHO, as long as it becomes non-zero.
>>>>>> 
>>>>>> AFAICT, the v1 of this patch was sufficient to address the original
>>>>>> bug and maintain the current intended semantics of the migration
>>>>>> counter. This v2 is mixing a bug fix with functional change in
>>>>>> behaviour and I don't think the latter is justified.
>>>>> 
>>>>> It's just that when it cannot report all fallback cases, I don't yet see
>>>>> how it would help much even if we fix the previous behavior with v1..
>>>>> 
>>>>> OTOH, the new behavior will be deemed to have no issue on the problem v1
>>>>> was fixing.
>>>>> 
>>>>> So IIUC v2's behavior is the one we want, and helps identify fallback
>>>>> happened.
>>>> 
>>>> I don't consider v2 acceptable as the value its returning is an
>>>> meaningless counter that doesn't correlate to any quantity that
>>>> is used by QEMU, nor visible to users of QEMU.
>>> 
>>> It can be a boolean if we want showing "if any fallback happened", that'll
>>> at least make it accurate and avoid false negatives. But IMHO a counter is
>>> always better, e.g. when we dump it from time to time we know if any more
>>> fallback happened.
>>> 
>>> In that case, no matter how that counter is defined in granularity that'll
>>> help, as long as it get boosted when fallback happened.
>>> 
>>> I also don't expect this value to be consumed by an user, but only reported
>>> by an user and should only be consumed by a developer.
>> 
>> Ok, so the problem is that we've got a design inversion between what
>> the kernel is reporting and what the io channel is reporting.
>> 
>> With the kernel notifications we can determine
>> 
>>  * All syscalls successfully used zero copy
>>  * At least one syscall failed to use zero copy
>> 
>> whereas what the io channel flush is (claiming) to report is
>> 
>>  * 1 => all syscalls failed to use zero copy
>>  * 0 => at least one syscall successfully used zero copy
>> 
>> and you cannot infer the latter from the former, as we have missing
>> information due to merging of notifications.
>> 
>> So we need to invert the return values semantics of the flush method
>> to account for the missing information:
>> 
>>  * 1 => at least one syscall failed to use zero copy
>>  * 0 => all syscalls successfully used zero copy
> Yep, this should be one good way to nail this problem.  Maybe Tejus, as a
> real consumer of this counter, will have a preference on how it looks the
> best.
> Thanks,
Hi all! Thank you for the suggestions, and apologies on the delay on this.

>  * 1 => at least one syscall failed to use zero copy
>  * 0 => all syscalls successfully used zero copy

I think this return semantic seems appropriate, and avoids the false positives like earlier. I can spin up a v3 
if everyone agrees on this?


Regards,
Tejus

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-03-16 16:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09  9:09 [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate Tejus GK
2026-03-09 16:48 ` Daniel P. Berrangé
2026-03-09 16:59   ` Peter Xu
2026-03-09 17:17     ` Daniel P. Berrangé
2026-03-09 17:42       ` Tejus GK
2026-03-09 17:51         ` Daniel P. Berrangé
2026-03-09 18:21           ` Peter Xu
2026-03-11 12:02             ` Daniel P. Berrangé
2026-03-11 15:30               ` Peter Xu
2026-03-11 16:56                 ` Daniel P. Berrangé
2026-03-11 17:28                   ` Peter Xu
2026-03-11 17:46                     ` Daniel P. Berrangé
2026-03-11 18:43                       ` Peter Xu
2026-03-16 16:26                         ` Tejus GK
2026-03-10  8:52 ` Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.