From: Wei Wang <wei.w.wang@intel.com>
To: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
qemu-devel@nongnu.org, virtio-dev@lists.oasis-open.org,
quintela@redhat.com, dgilbert@redhat.com, pbonzini@redhat.com,
liliang.opensource@gmail.com, yang.zhang.wz@gmail.com,
quan.xu0@gmail.com, nilal@redhat.com, riel@redhat.com,
zhang.zhanghailiang@huawei.com
Subject: Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
Date: Tue, 05 Jun 2018 21:22:51 +0800 [thread overview]
Message-ID: <5B168EAB.7090607@intel.com> (raw)
In-Reply-To: <20180605065857.GC9216@xz-mi>
On 06/05/2018 02:58 PM, Peter Xu wrote:
> On Mon, Jun 04, 2018 at 04:04:51PM +0800, Wei Wang wrote:
>> On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
>>> On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
>>>> On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>>>>>> +/*
>>>>>> + * Balloon will report pages which were free at the time of this call. As the
>>>>>> + * reporting happens asynchronously, dirty bit logging must be enabled before
>>>>>> + * this call is made.
>>>>>> + */
>>>>>> +void balloon_free_page_start(void)
>>>>>> +{
>>>>>> + balloon_free_page_start_fn(balloon_opaque);
>>>>>> +}
>>>>> Please create notifier support, not a single global.
>>>> OK. The start is called at the end of bitmap_sync, and the stop is called at
>>>> the beginning of bitmap_sync. In this case, we will need to add two
>>>> migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
>>>> MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?
>> Peter, do you have any thought about this?
>>
>> Currently, the usage of free page optimization isn't limited to the first
>> stage. It is used in each stage. A global call to start the free page
>> optimization is made after bitmap sync, and another global call to stop the
>> optimization is made before bitmap sync. It is simple to just use global
>> calls.
>>
>> If we change the implementation to use notifiers, I think we will need to
>> add two new MigrationStatus as above. Would you think that is worthwhile for
>> some reason?
> I'm a bit confused. Could you elaborate why we need those extra
> states?
Sure. Notifiers are used when an event happens. In this case, it would
be a state change, which invokes the state change callback. So I think
we probably need to add 2 new states for the start and stop callback.
> Or, to ask a more general question - could you elaborate a bit on how
> you order these operations? I would be really glad if you can point
> me to some documents for the feature. Is there any latest virtio
> document that I can refer to (or old cover letter links)? It'll be
> good if the document could mention about things like:
I haven't made documents to explain it yet. It's planed to be ready
after this code series is done. But I'm glad to answer the questions below.
>
> - why we need this feature? Is that purely for migration purpose? Or
> it can be used somewhere else too?
Yes. Migration is the one that currently benefits a lot from this
feature. I haven't thought of others so far. It is common that new
features start with just 1 or 2 typical use cases.
> - high level stuff about how this is implemented, e.g.:
> - the protocol of the new virtio queues
> - how we should get the free page hints (please see below)
The high-level introduction would be
1. host sends a start cmd id to the guest;
2. the guest starts a new round of reporting by sending a cmd_id+free
page hints to host;
3. QEMU side optimization code applies the free page hints (filter them
from the dirty bitmap) only when the reported cmd id matches the one
that was just sent.
The protocol was suggested by Michael and has been thoroughly discussed
when upstreaming the kernel part. It might not be necessary to go over
that again :)
I would suggest to focus on the supplied interface and its usage in live
migration. That is, now we have two APIs, start() and stop(), to start
and stop the optimization.
1) where in the migration code should we use them (do you agree with the
step (1), (2), (3) you concluded below?)
2) how should we use them, directly do global call or via notifiers?
>
> For now, what I see is that we do:
>
> (1) stop hinting
> (2) sync bitmap
> (3) start hinting
>
> Why this order?
We start to filter out free pages from the dirty bitmap only when all
the dirty bits are ready there, i.e. after sync bitmap. To some degree,
the action of synchronizing bitmap indicates the end of the last round
and the beginning of the new round, so we stop the free page
optimization for the old round when the old round ends.
> My understanding is that obviously there is a race
> between the page hinting thread and the dirty bitmap tracking part
> (which is done in KVM). How do we make sure there is no race?
Could you please explain more about the race you saw? (free page is
reported from the guest, and the bitmap is tracked in KVM)
>
> An direct question is that, do we need to make sure step (1) must be
> before step (2)? Asked since I see that currently step (1) is an
> async operation (taking a lock, set status, then return). Then would
> such an async operation satisfy any ordering requirement after all?
Yes. Step(1) guarantees us that the QEMU side optimization call has
exited (we don't need to rely on guest side ACK because the guest could
be in any state). This is enough. If the guest continues to report after
that, that reported hints will be detected as stale hints and dropped in
the next start of optimization.
>
> Btw, I would appreciate if you can push your new trees (both QEMU and
> kernel) to the links you mentioned in the cover letter - I noticed
> that they are not the same as what you have posted on the list.
>
Sure.
For kernel part, you can get it from linux-next:
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
For the v7 QEMU part:
git://github.com/wei-w-wang/qemu-free-page-hint.git (my connection to
github is too slow, it would be ready in 24hours, I can also send you
the raw patches via email if you need)
Best,
Wei
next prev parent reply other threads:[~2018-06-05 13:19 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-24 6:13 [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support Wei Wang
2018-04-24 6:13 ` [Qemu-devel] [PATCH v7 1/5] bitmap: bitmap_count_one_with_offset Wei Wang
2018-04-24 6:13 ` [Qemu-devel] [PATCH v7 2/5] migration: use bitmap_mutex in migration_bitmap_clear_dirty Wei Wang
2018-06-01 3:37 ` Peter Xu
2018-04-24 6:13 ` [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap Wei Wang
2018-06-01 4:00 ` Peter Xu
2018-06-01 7:36 ` Wei Wang
2018-06-01 10:06 ` Peter Xu
2018-06-01 12:32 ` Wei Wang
2018-06-04 2:49 ` Peter Xu
2018-06-04 7:43 ` Wei Wang
2018-04-24 6:13 ` [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-05-29 15:24 ` Michael S. Tsirkin
2018-05-30 9:12 ` Wei Wang
2018-05-30 12:47 ` Michael S. Tsirkin
2018-05-31 2:27 ` Wei Wang
2018-05-31 17:42 ` Michael S. Tsirkin
2018-06-01 3:18 ` Wei Wang
2018-06-04 8:04 ` Wei Wang
2018-06-05 6:58 ` Peter Xu
2018-06-05 13:22 ` Wei Wang [this message]
2018-06-06 5:42 ` Peter Xu
2018-06-06 10:04 ` Wei Wang
2018-06-06 11:02 ` Peter Xu
2018-06-07 5:24 ` Wei Wang
2018-06-07 6:32 ` Peter Xu
2018-06-07 11:59 ` Wei Wang
2018-06-08 2:17 ` Peter Xu
2018-06-08 7:14 ` Wei Wang
2018-06-08 7:31 ` Wei Wang
2018-06-06 6:43 ` Peter Xu
2018-06-06 10:11 ` Wei Wang
2018-06-07 3:17 ` Peter Xu
2018-06-07 5:29 ` Wei Wang
2018-06-07 6:58 ` Peter Xu
2018-06-07 12:01 ` Wei Wang
2018-06-08 1:37 ` Peter Xu
2018-06-08 1:58 ` Peter Xu
2018-06-08 1:58 ` Michael S. Tsirkin
2018-06-08 2:34 ` Peter Xu
2018-06-08 2:49 ` Michael S. Tsirkin
2018-06-08 3:34 ` Peter Xu
2018-04-24 6:13 ` [Qemu-devel] [PATCH v7 5/5] migration: use the free page hint feature from balloon Wei Wang
2018-04-24 6:42 ` [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support Wei Wang
2018-05-14 1:22 ` Wei Wang
2018-05-29 15:00 ` Hailiang Zhang
2018-05-29 15:24 ` Michael S. Tsirkin
2018-06-01 4:58 ` Peter Xu
2018-06-01 5:07 ` Peter Xu
2018-06-01 7:29 ` Wei Wang
2018-06-01 10:02 ` Peter Xu
2018-06-01 12:31 ` Wei Wang
2018-06-01 7:21 ` Wei Wang
2018-06-01 10:40 ` Peter Xu
2018-06-01 15:33 ` Dr. David Alan Gilbert
2018-06-05 6:42 ` Peter Xu
2018-06-05 14:40 ` Michael S. Tsirkin
2018-06-05 14:39 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5B168EAB.7090607@intel.com \
--to=wei.w.wang@intel.com \
--cc=dgilbert@redhat.com \
--cc=liliang.opensource@gmail.com \
--cc=mst@redhat.com \
--cc=nilal@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quan.xu0@gmail.com \
--cc=quintela@redhat.com \
--cc=riel@redhat.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=yang.zhang.wz@gmail.com \
--cc=zhang.zhanghailiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).