From: Peter Xu <peterx@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Tejus GK <tejus.gk@nutanix.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Fabiano Rosas <farosas@suse.de>, Eric Blake <eblake@redhat.com>,
Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate
Date: Mon, 9 Mar 2026 14:21:49 -0400 [thread overview]
Message-ID: <aa8Pvd5O-OLKIXam@x1.local> (raw)
In-Reply-To: <aa8IoQOZHzQvacSh@redhat.com>
On Mon, Mar 09, 2026 at 05:51:29PM +0000, Daniel P. Berrangé wrote:
> On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
> >
> >
> > > On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > > !-------------------------------------------------------------------|
> > > CAUTION: External Email
> > >
> > > |-------------------------------------------------------------------!
> > >
> > > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> > >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > >>>> @@ -881,8 +881,8 @@ static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> > >>>> sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > >>>>
> > >>>> /* If any sendmsg() succeeded using zero copy, mark zerocopy success */
> > >>>> - if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > >>>> - sioc->new_zero_copy_sent_success = true;
> > >>>> + if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > >>>> + sioc->zero_copy_fallback++;
> > >>>
> > >>> ...this is counting the number of MSG_ERRQUEUE items, which is not
> > >>> the same as the number of IO requests. That's why we only used it
> > >>> as a boolean marker originally, rather than making it a counter.
> > >>
> > >> Would the logic still work and better than before? Say, it's a counter of
> > >> "messages" rather than "IOs" then.
> > >
> > > IIUC it is a counter of processing notifications which is not directly
> > > correlated to any action by QEMU - neither bytes nor syscalls.
> >
> > Please correct me if I'm wrong about this, isn’t each notification an information
> > about what happened to an individual IO?
>
> If userspace hasn't read a queued notification yet, the kernel will
> merge new notifications with the existing queued one.
>
> The line above your change
>
> serr->ee_data - serr->ee_info + 1;
>
> records how many notifications were merged, so we now how many
> syscalls were processed.
>
> If ee_code is SO_EE_CODE_ZEROCOPY_COPIED though it means at least
> one syscall resulted in a copy, but that doesn't imply that *all*
> syscalls resulted in a copy.
>
> AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
> or it could be 1000 out of 1000 resulted in a copy. We don't know.
>
> IIUC the kernel's merging of notifications appears lossy wrt this
> information. It could be partially mitigated by doing a flush for
> notifications really really frequently but that feels like it would
> have its own downsides
IMHO what this change does is removing the false negatives.
Before this patch, if QEMU reports fallback=0, it doesn't mean all the
MSG_ZEROCOPY requests were all fulfilled by zerocopy. It's because we
justify it with one boolean over "a period of time" between two flushes, we
set the boolean to TRUE as long as there is _one_ successful report of
MSG_ZEROCOPY. So even if every flush reports TRUE it only means "there is
at least one MSG_ZEROCOPY request that didn't fallback". It has no
implication of whether a fallback happened.
Hence, before this v2 patch, there can be false negative reported by QEMU,
assuming there's no fallback (reflected in stats) but it actually happened.
After this patch, if QEMU reports fallback=0, it guarantees that _all_
MSG_ZEROCOPY requests are fulfilled with zerocopy. It's because we monitor
all messages and accumulate any fallback cases. Even if the messages can
be merged, when "fallback" shows anything non-zero would imply some
fallback happened. Here, the counter value doesn't really matter much
IMHO, as long as it becomes non-zero.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2026-03-09 18:22 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-09 9:09 [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate Tejus GK
2026-03-09 16:48 ` Daniel P. Berrangé
2026-03-09 16:59 ` Peter Xu
2026-03-09 17:17 ` Daniel P. Berrangé
2026-03-09 17:42 ` Tejus GK
2026-03-09 17:51 ` Daniel P. Berrangé
2026-03-09 18:21 ` Peter Xu [this message]
2026-03-11 12:02 ` Daniel P. Berrangé
2026-03-11 15:30 ` Peter Xu
2026-03-11 16:56 ` Daniel P. Berrangé
2026-03-11 17:28 ` Peter Xu
2026-03-11 17:46 ` Daniel P. Berrangé
2026-03-11 18:43 ` Peter Xu
2026-03-16 16:26 ` Tejus GK
2026-03-10 8:52 ` Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa8Pvd5O-OLKIXam@x1.local \
--to=peterx@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=eblake@redhat.com \
--cc=farosas@suse.de \
--cc=qemu-devel@nongnu.org \
--cc=tejus.gk@nutanix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.