From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Leonardo Brás" <leobras@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
"Juan Quintela" <quintela@redhat.com>,
"Peter Xu" <peterx@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [PATCH v1 2/2] migration/multifd: Warn user when zerocopy not working
Date: Mon, 4 Jul 2022 10:19:43 +0100 [thread overview]
Message-ID: <YsKwrz8t6VgKhm4V@work-vm> (raw)
In-Reply-To: <66da1d2d1617c61012a515fc3104866ee5d49f69.camel@redhat.com>
* Leonardo Brás (leobras@redhat.com) wrote:
> On Tue, 2022-06-28 at 17:56 +0100, Dr. David Alan Gilbert wrote:
> > * Leonardo Bras Soares Passos (leobras@redhat.com) wrote:
> > > On Tue, Jun 28, 2022 at 10:52 AM Dr. David Alan Gilbert
> > > <dgilbert@redhat.com> wrote:
> > > >
> > > > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > > > On Tue, Jun 28, 2022 at 09:32:04AM -0300, Leonardo Bras Soares Passos
> > > > > wrote:
> > > > > > On Tue, Jun 28, 2022 at 4:53 AM Daniel P. Berrangé
> > > > > > <berrange@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 27, 2022 at 10:09:09PM -0300, Leonardo Bras wrote:
> > > > > > > > Some errors, like the lack of Scatter-Gather support by the
> > > > > > > > network
> > > > > > > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail
> > > > > > > > on using
> > > > > > > > zero-copy, which causes it to fall back to the default copying
> > > > > > > > mechanism.
> > > > > > >
> > > > > > > How common is this lack of SG support ? What NICs did you have that
> > > > > > > were affected ?
> > > > > >
> > > > > > I am not aware of any NIC without SG available for testing, nor have
> > > > > > any idea on how common they are.
> > > > > > But since we can detect sendmsg() falling back to copying we should
> > > > > > warn the user if this ever happens.
> > > > > >
> > > > > > There is also a case in IPv6 related to fragmentation that may cause
> > > > > > MSG_ZEROCOPY to fall back to the copying mechanism, so it's also
> > > > > > covered.
> > > > > >
> > > > > > >
> > > > > > > > After each full dirty-bitmap scan there should be a zero-copy
> > > > > > > > flush
> > > > > > > > happening, which checks for errors each of the previous calls to
> > > > > > > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy,
> > > > > > > > then
> > > > > > > > warn the user about it.
> > > > > > > >
> > > > > > > > Since it happens once each full dirty-bitmap scan, even in worst
> > > > > > > > case
> > > > > > > > scenario it should not print a lot of warnings, and will allow
> > > > > > > > tracking
> > > > > > > > how many dirty-bitmap iterations were not able to use zero-copy
> > > > > > > > send.
> > > > > > >
> > > > > > > For long running migrations which are not converging, or converging
> > > > > > > very slowly there could be 100's of passes.
> > > > > > >
> > > > > >
> > > > > > I could change it so it only warns once, if that is too much output.
> > > > >
> > > > > Well I'm mostly wondering what we're expecting the user todo with this
> > > > > information.
> > >
> > >
> > > My rationale on that:
> > > - zero-copy-send is a feature that is supposed to improve send
> > > throughput by reducing cpu usage.
> > > - there is a chance the sendmsg(MSG_ZEROCOPY) fails to use zero-copy
> > > - if this happens, there will be a potential throughput decrease on
> > > sendmsg()
> > > - the user (or management app) need to know when zero-copy-send is
> > > degrading throughput, so it can be disabled
> > > - this is also important for performance testing, given it can be
> > > confusing having zero-copy-send improving throughput in some cases,
> > > and degrading in others, without any apparent reason why.
> > >
> > > > > Generally a log file containing warnings ends up turning
> > > > > into a bug report. If we think it is important for users and/or mgmt
> > > > > apps to be aware of this info, then it might be better to actually
> > > > > put a field in the query-migrate stats to report if zero-copy is
> > > > > being honoured or not,
> > > >
> > > > Yeh just a counter would work there I think.
> > >
> > > The warning idea was totally due to my inexperience on this mgmt app
> > > interface, since I had no other idea on how to deal with that.
> >
> > Yeh it's not too silly an idea!
> > The way some of these warning or stats get to us can be a bit random,
> > but sometimes can confuse things.
> >
> > > I think having it in query-migrate is a much better idea than a
> > > warning, since it should be much easier to parse and disable
> > > zero-copy-send if desired.
> > > Even in my current qemu test script, it's much better having it in
> > > query-migrate.
> > >
> > > >
> > > > > and just have a trace point in this location
> > > > > instead.
> > > >
> > > > Yeh.
> > > >
> > >
> > > Yeap, the counter idea seems great!
> > > Will it be always printed there, or only when zero-copy-send is enabled?
> >
> > You could make it either if it's enabled or if it's none zero.
> > (I guess you want it to reset to 0 at the start of a new migration).
> >
> > Dave
>
> Thanks for this feedback!
>
> I have everything already working, but I am struggling with a good property
> name.
>
> I am currently using zero_copy_copied (or zero-copy-copied in json), but it does
> not look like a good Migration stat name.
>
> Do you have any suggestion?
Shrug; I'm not going to fuss over the name too much as long as it's
reasonable. 'zero-copied' might be OK.
Dave
> Best regards,
> Leo
>
>
> >
> > >
> > > Best regards,
> > > Leo
> > >
> > > > Dave
> > > >
> > > > > With regards,
> > > > > Daniel
> > > > > --
> > > > > > : https://berrange.com -o-
> > > > > > https://www.flickr.com/photos/dberrange :|
> > > > > > : https://libvirt.org -o-
> > > > > > https://fstop138.berrange.com :|
> > > > > > : https://entangle-photo.org -o-
> > > > > > https://www.instagram.com/dberrange :|
> > > > >
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > >
> > >
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
prev parent reply other threads:[~2022-07-04 9:21 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-28 1:09 [PATCH v1 0/2] Zero copy improvements (QIOChannel + multifd) Leonardo Bras
2022-06-28 1:09 ` [PATCH v1 1/2] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent Leonardo Bras
2022-06-28 1:09 ` [PATCH v1 2/2] migration/multifd: Warn user when zerocopy not working Leonardo Bras
2022-06-28 7:53 ` Daniel P. Berrangé
2022-06-28 12:32 ` Leonardo Bras Soares Passos
2022-06-28 13:02 ` Daniel P. Berrangé
2022-06-28 13:52 ` Dr. David Alan Gilbert
2022-06-28 16:53 ` Leonardo Bras Soares Passos
2022-06-28 16:56 ` Dr. David Alan Gilbert
2022-07-01 6:18 ` Leonardo Brás
2022-07-04 9:19 ` Dr. David Alan Gilbert [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YsKwrz8t6VgKhm4V@work-vm \
--to=dgilbert@redhat.com \
--cc=berrange@redhat.com \
--cc=leobras@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.