Re: [RFC PATCH 6/7] migration/multifd: Move payload storage out of the channel parameters

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: "Wang, Lei" <lei4.wang@intel.com>,
	qemu-devel@nongnu.org,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>
Subject: Re: [RFC PATCH 6/7] migration/multifd: Move payload storage out of the channel parameters
Date: Fri, 12 Jul 2024 09:44:02 -0300	[thread overview]
Message-ID: <875xtaeq5p.fsf@suse.de> (raw)
In-Reply-To: <ZpBXejPa8Vac6ZqG@x1n>

Peter Xu <peterx@redhat.com> writes:

> On Thu, Jul 11, 2024 at 06:12:44PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Thu, Jul 11, 2024 at 04:37:34PM -0300, Fabiano Rosas wrote:
>> >
>> > [...]
>> >
>> >> We also don't flush the iov at once, so f->buf seems redundant to
>> >> me. But of course, if we touch any of that we must ensure we're not
>> >> dropping any major optimization.
>> >
>> > Yes some tests over that would be more persuasive when it comes.
>> >
>> > Per my limited experience in the past few years: memcpy on chips nowadays
>> > is pretty cheap.  You'll see very soon one more example of that when you
>> > start to look at the qatzip series: that series decided to do one more
>> > memcpy for all guest pages, to make it a larger chunk of buffer instead of
>> > submitting the compression tasks in 4k chunks (while I thought 4k wasn't
>> > too small itself).
>> >
>> > That may be more involved so may not be a great example (e.g. the
>> > compression algo can be special in this case where it just likes larger
>> > buffers), but it's not uncommon that I see people trade things with memcpy,
>> > especially small buffers.
>> >
>> > [...]
>> >
>> >> Any piece of code that fills an iov with data is prone to be able to
>> >> send that data through multifd. From this perspective, multifd is just a
>> >> way to give work to an iochannel. We don't *need* to use it, but it
>> >> might be simple enough to the point that the benefit of ditching
>> >> QEMUFile can be reached without too much rework.
>> >> 
>> >> Say we provision multifd threads early and leave them waiting for any
>> >> part of the migration code to send some data. We could have n-1 threads
>> >> idle waiting for the bulk of the data and use a single thread for any
>> >> early traffic that does not need to be parallel.
>> >> 
>> >> I'm not suggesting we do any of this right away or even that this is the
>> >> correct way to go, I'm just letting you know some of my ideas and why I
>> >> think ram + device state might not be the only data we put through
>> >> multifd.
>> >
>> > We can wait and see whether that can be of any use in the future, even if
>> > so, we still have chance to add more types into the union, I think.  But
>> > again, I don't expect.
>> >
>> > My gut feeling: we shouldn't bother putting any (1) non-huge-chunk, or (2)
>> > non-IO, data onto multifd.  Again, I would ask "why not the main channel",
>> > otherwise.
>> >
>> > [...]
>> >
>> >> Just to be clear, do you want a thread-pool to replace multifd? Or would
>> >> that be only used for concurrency on the producer side?
>> >
>> > Not replace multifd.  It's just that I was imagining multifd threads only
>> > manage IO stuff, nothing else.
>> >
>> > I was indeed thinking whether we can reuse multifd threads, but then I
>> > found there's risk mangling these two concepts, as: when we do more than IO
>> > in multifd threads (e.g., talking to VFIO kernel fetching data which can
>> > block), we have risk of blocking IO even if we can push more so the NICs
>> > can be idle again.  There's also the complexity where the job fetches data
>> > from VFIO kernel and want to enqueue again, it means an multifd task can
>> > enqueue to itself, and circular enqueue can be challenging: imagine 8
>> > concurrent tasks (with a total of 8 multifd threads) trying to enqueue at
>> > the same time; they hunger themselves to death.  Things like that.  Then I
>> > figured the rest jobs are really fn(void*) type of things; they should
>> > deserve their own pool of threads.
>> >
>> > So the VFIO threads (used to be per-device) becomes migration worker
>> > threads, we need them for both src/dst: on dst there's still pending work
>> > to apply the continuous VFIO data back to the kernel driver, and that can't
>> > be done by multifd thread too due to similar same reason.  Then those dest
>> > side worker threads can also do load() not only for VFIO but also other
>> > device states if we can add more.
>> >
>> > So to summary, we'll have:
>> >
>> >   - 1 main thread (send / recv)
>> >   - N multifd threads (IOs only)
>> >   - M worker threads (jobs only)
>> >
>> > Of course, postcopy not involved..  How's that sound?
>> 
>> Looks good. There's a better divide between producer and consumer this
>> way. I think it will help when designing new features.
>> 
>> One observation is that we'll still have two different entities doing IO
>> (multifd threads and the migration thread), which I would prefer were
>> using a common code at a higher level than the iochannel.
>
> At least for the main channel probably yes.  I think Dan has had the idea
> of adding the buffering layer over iochannels, then replace qemufiles with
> that.  Multifd channels looks ok so far to use as raw channels.
>
>> 
>> One thing that I tried to look into for mapped-ram was whether we could
>> set up iouring in the migration code, but got entirely discouraged by
>> the migration thread doing IO at random points. And of course, you've
>> seen what we had to do with direct-io. That was in part due to having
>> the migration thread in parallel doing it's small writes at undetermined
>> points in time.
>
> On the locked_vm part: probably yes, we'd better try to avoid using page
> pinning if possible. It just looks like it becomes a more important
> scenario nowadays to put VMs into containers, it means then such feature
> may not be always usable there.
>
> For the rest: I really don't know much on iouring, but I remember it can be
> fast normally only in a poll model with interrupt-less context?

I'm not sure. I mainly thought about using it to save syscalls on the
write side. But as I said, I didn't look further into it.

> Not sure
> whether it suites here for us, as I guess we should avoid consuming cpu
> resourcess with no good reason, and polling for perf falls into that
> category, I think.  Even without it, kubevirt now already has issue on
> multifd eating cpus, and people observe multifd threads causing vcpu
> threads to be throttled, interrupting guest workloads; they're currently
> put in the same container.  I also not sure how much it'll help comparing
> to when we have the multi-threading ready.  I suspect not that much.

Do you have a reference for that kubevirt issue I could look at? It
maybe interesting to investigate further. Where's the throttling coming
from? And doesn't less vcpu time imply less dirtying and therefore
faster convergence?

next prev parent reply	other threads:[~2024-07-12 12:44 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-20 21:21 [RFC PATCH 0/7] migration/multifd: Introduce storage slots Fabiano Rosas
2024-06-20 21:21 ` [RFC PATCH 1/7] migration/multifd: Reduce access to p->pages Fabiano Rosas
2024-06-21 14:42   ` Peter Xu
2024-06-20 21:21 ` [RFC PATCH 2/7] migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov Fabiano Rosas
2024-06-20 21:21 ` [RFC PATCH 3/7] migration/multifd: Replace p->pages with an opaque pointer Fabiano Rosas
2024-06-20 21:21 ` [RFC PATCH 4/7] migration/multifd: Move pages accounting into multifd_send_zero_page_detect() Fabiano Rosas
2024-06-20 21:21 ` [RFC PATCH 5/7] migration/multifd: Isolate ram pages packet data Fabiano Rosas
2024-07-19 14:40   ` Fabiano Rosas
2024-06-20 21:21 ` [RFC PATCH 6/7] migration/multifd: Move payload storage out of the channel parameters Fabiano Rosas
2024-06-27  3:27   ` Wang, Lei
2024-06-27 14:40     ` Peter Xu
2024-06-27 15:17       ` Peter Xu
2024-07-10 16:10       ` Fabiano Rosas
2024-07-10 19:10         ` Peter Xu
2024-07-10 20:16           ` Fabiano Rosas
2024-07-10 21:55             ` Peter Xu
2024-07-11 14:12               ` Fabiano Rosas
2024-07-11 16:11                 ` Peter Xu
2024-07-11 19:37                   ` Fabiano Rosas
2024-07-11 20:27                     ` Peter Xu
2024-07-11 21:12                       ` Fabiano Rosas
2024-07-11 22:06                         ` Peter Xu
2024-07-12 12:44                           ` Fabiano Rosas [this message]
2024-07-12 15:37                             ` Peter Xu
2024-07-18 19:39                   ` Fabiano Rosas
2024-07-18 21:12                     ` Peter Xu
2024-07-18 21:27                       ` Fabiano Rosas
2024-07-18 21:52                         ` Peter Xu
2024-07-18 22:32                           ` Fabiano Rosas
2024-07-19 14:04                             ` Peter Xu
2024-07-19 16:54                               ` Fabiano Rosas
2024-07-19 17:58                                 ` Peter Xu
2024-07-19 21:30                                   ` Fabiano Rosas
2024-07-16 20:10             ` Maciej S. Szmigiero
2024-07-17 19:00               ` Peter Xu
2024-07-17 21:07                 ` Maciej S. Szmigiero
2024-07-17 21:30                   ` Peter Xu
2024-06-20 21:21 ` [RFC PATCH 7/7] migration/multifd: Hide multifd slots implementation Fabiano Rosas
2024-06-21 14:44 ` [RFC PATCH 0/7] migration/multifd: Introduce storage slots Maciej S. Szmigiero
2024-06-21 15:04   ` Fabiano Rosas
2024-06-21 15:31     ` Maciej S. Szmigiero
2024-06-21 15:56       ` Peter Xu
2024-06-21 17:40         ` Maciej S. Szmigiero
2024-06-21 20:54           ` Peter Xu
2024-06-23 20:25             ` Maciej S. Szmigiero
2024-06-23 20:45               ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875xtaeq5p.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=lei4.wang@intel.com \
    --cc=mail@maciej.szmigiero.name \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).