Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: "Wang, Wei W" <wei.w.wang@intel.com>
Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>,
	David Hildenbrand <david@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>
Subject: Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Date: Fri, 9 Jul 2021 10:48:00 -0400	[thread overview]
Message-ID: <YOhhoHJFyiQAEBRZ@t490s> (raw)
In-Reply-To: <b242b77a68c64ae9aa13ae0dc6c081ec@intel.com>

On Fri, Jul 09, 2021 at 08:58:08AM +0000, Wang, Wei W wrote:
> On Friday, July 9, 2021 2:31 AM, Peter Xu wrote:
> > > > Yes I think this is the place I didn't make myself clear.  It's not
> > > > about sleeping, it's about the cmpxchg being expensive already when the vm
> > is huge.
> > >
> > > OK.
> > > How did you root cause that it's caused by cmpxchg, instead of lock
> > > contention (i.e. syscall and sleep) or some other code inside
> > pthread_mutex_lock(). Do you have cycles about cmpxchg v.s. cycles of
> > pthread_mutex_lock()?
> > 
> > We've got "perf top -g" showing a huge amount of stacks lying in
> > pthread_mutex_lock().
> 
> This only explains pthread_mutex_lock is the cause, not root caused to cmpxchg.

I think that's enough already to prove we can move the lock elsewhere.

It's not really a heavy race between threads; it's the pure overhead we called
it too many times.  So it's not really a problem yet about "what type of lock
we should use" (mutex or spin lock) or "how this lock is implemented" (say,
whether using cmpxchg only or optimize using test + test-and-set, as that
sounds like a good optimization of pure test-and-set spinlocks) because the
lock is not busy at all.

If we have two busy threads contentioning on the lock for real, that'll be
another problem, imho.  It's not yet the case.

> 
> > 
> > >
> > > I check the implementation of pthread_mutex_lock(). The code path for lock
> > acquire is long. QemuSpin looks more efficient.
> > > (probably we also don’t want migration thread to sleep in any case)
> > 
> > I didn't check it, but I really hoped it should be very close to a spinlock version
> > when it's not contended.  We should be careful on using spin locks in userspace,
> > e.g., with that moving clear log into critical section will be too much and actuall
> > close to "wrong", imho, because the kvm memslot lock inside is sleepable.
> 
> OK, that might be a good reason to keep clear_map out of the lock.
> We also don’t want the lock holder to have more chances to go to sleep though it is mutex.

Yes that looks fine to me, but that's really another topic.  It's not a
requirement either before spinlock is justified to be better than mutex here.

> 
> > 
> > I think it's very fine to have migration thread sleep.  IMHO we need explicit
> > justification for each mutex to be converted to a spinlock in userspace.  So far I
> > don't see it yet for this bitmap lock.
> 
> What if the guest gets stopped and then the migration thread goes to sleep?

Isn't the balloon code run in a standalone iothread?  Why guest stopped can
stop migration thread?

> 
> > 
> > Frankly I also don't know how spinlock would work reliably without being able to
> > disable scheduling, then could the thread got scheduled out duing taking the
> > spinlock?  Would another thread trying to take this lock spinning on this sleeping
> > task?
> 
> Yes, and it needs good justification:
> If it's per-page spinlock, the granularity is very small, so it has very little chance that a lock holder gets scheduled out as time slice uses up. Even that happens once, it seems no big issues, just the waiter wastes some CPU cycles, this is better than having the migration thread scheduled out by mutex, I think.

Yes I believe there's good reason when Emilio introduced the lock, I didn't
measure it myself so I can't tell.  It's just that we need more justification
that this mutex is needed to be replaced with a spinlock.

> 
> > 
> > >
> > > I think it's also better to see the comparison of migration throughput data (i.e.
> > pages per second) in the following cases, before we make a decision:
> > > - per-page mutex
> > > - per-page spinlock
> > > - 50-ms mutex
> > 
> > I don't have the machines, so I can't do this.  I also see this as separate issues to
> > solve to use spinlock, as I said before I would prefer some justification first to use
> > it rather than blindly running tests and pick a patch with higher number.
> > 
> > I also hope we can reach a consensus that we fix one issue at a time.  This patch
> > already proves itself with some real workloads, I hope we can merge it first,
> > either with 50ms period or less.
> > 
> > Per-page locking is already against at least my instinction of how this should be
> > done; I just regreted I didn't see that an issue when reviewing the balloon patch
> > as I offered the r-b myself.  However I want to make it work as before then we
> > figure out a better approach on how the lock is taken, and which lock we should
> > use.  I don't see it a must to do all things in one patch.
> 
> Sorry, it wasn't my intention to be a barrier to merging your patch.

No problem, and I appreciate for all the discussions.  It's just that as you
see I still prefer to keep it like this first.

free-page-hint is a great feature, though if we grep it in libvirt we found
that it's not yet enabled anywhere.  It means it's not used as a majority, yet.
Meanwhile the code path this patch changes happen for each migration that qemu
started.  What I'm afraid is we worry too much on what most people are not
using, during which we didn't really make the majority works better.

From what I learned in the past few years, funnily "speed of migration" is
normally not what people care the most.  Issues are mostly with convergence and
being transparent to users using the VMs so they aren't even aware.

I admit this patch may not be the perfect one, but that's the reason I wanted
to do it in two steps as spinlock definitely needs more justification here,
which I can't provide myself, while this patch is verified to help.

> Just try to come up with better solutions if possible.
> - Option1 QemuSpin lock would reduce the lock acquiring overhead, but need to test if that migration could converge;
> - Option2 conditional lock. We haven't see the test results if turning on free page optimization with that per-page lock could still make your 2TB guest migration converge.

Yes after I checked libvirt I am more confident it's not with free-page-hint,
and as I said I wonder why you think free-page-hint could help a heavy memory
usage guest at all, as I mentioned most memories are used.

If we want to further optimize it, I don't think it'll make sense to chooise
either option 1 or 2 but only to take them all, but it'll be great when it
comes who proposed that new change also compare that with current patch 50ms
even if sleepable, so I'm wondering whether that'll really make a measurable
difference, especially if we can also shrink that 50ms period too, which is
what I asked initially.

> 
> Seems we lack resources for those tests right now. If you are urgent for a decision to have it work first, I'm also OK with you to merge it.

No I can't merge it myself as I'm not the maintainer. :) I haven't received any
ack yet, so at least I'll need to see how Dave and Juan think.  It's just that
I don't think qemuspin could help much in this case, and I don't want to mess
up the issue too.

Thanks for being considerate.

-- 
Peter Xu

next prev parent reply	other threads:[~2021-07-09 14:57 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-30 20:08 [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty() Peter Xu
2021-07-01  4:42 ` Wang, Wei W
2021-07-01 12:51   ` Peter Xu
2021-07-01 14:21     ` David Hildenbrand
2021-07-02  2:48       ` Wang, Wei W
2021-07-02  7:06         ` David Hildenbrand
2021-07-03  2:53           ` Wang, Wei W
2021-07-05 13:41             ` David Hildenbrand
2021-07-06  9:41               ` Wang, Wei W
2021-07-06 10:05                 ` David Hildenbrand
2021-07-06 17:39                   ` Peter Xu
2021-07-07 12:45                     ` Wang, Wei W
2021-07-07 16:45                       ` Peter Xu
2021-07-07 23:25                         ` Wang, Wei W
2021-07-08  0:21                           ` Peter Xu
2021-07-06 17:47             ` Peter Xu
2021-07-07  8:34               ` Wang, Wei W
2021-07-07 16:54                 ` Peter Xu
2021-07-08  2:55                   ` Wang, Wei W
2021-07-08 18:10                     ` Peter Xu
2021-07-02  2:29     ` Wang, Wei W
2021-07-06 17:59       ` Peter Xu
2021-07-07  8:33         ` Wang, Wei W
2021-07-07 16:44           ` Peter Xu
2021-07-08  2:49             ` Wang, Wei W
2021-07-08 18:30               ` Peter Xu
2021-07-09  8:58                 ` Wang, Wei W
2021-07-09 14:48                   ` Peter Xu [this message]
2021-07-13  8:20                     ` Wang, Wei W
2021-07-03 16:31 ` Lukas Straub
2021-07-04 14:14   ` Lukas Straub
2021-07-06 18:37     ` Peter Xu
2021-07-13  8:40 ` Wang, Wei W
2021-07-13 10:22   ` David Hildenbrand
2021-07-14  5:03     ` Wang, Wei W
2021-07-13 15:59   ` Peter Xu
2021-07-14  5:04     ` Wang, Wei W

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YOhhoHJFyiQAEBRZ@t490s \
    --to=peterx@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wei.w.wang@intel.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).