From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: "virtualization@lists.linux-foundation.org"
<virtualization@lists.linux-foundation.org>,
Tyler Sanderson <tysand@google.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Nadav Amit <namit@vmware.com>,
David Rientjes <rientjes@google.com>,
Alexander Duyck <alexander.h.duyck@linux.intel.com>,
Michal Hocko <mhocko@kernel.org>
Subject: Re: Balloon pressuring page cache
Date: Wed, 5 Feb 2020 05:25:12 -0500 [thread overview]
Message-ID: <20200205051404-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <ef1dfd46-49c8-8aa9-2a5e-d2ebb2e093f5@redhat.com>
On Wed, Feb 05, 2020 at 10:58:14AM +0100, David Hildenbrand wrote:
> On 05.02.20 10:49, Wang, Wei W wrote:
> > On Wednesday, February 5, 2020 5:37 PM, David Hildenbrand wrote:
> >>>
> >>> Not sure how TCG tracks the dirty bits. But In whatever
> >>> implementation, the hypervisor should have
> >>
> >> There is only a single bitmap for that purpose. (well, the one where KVM
> >> syncs to)
> >>
> >>> already dealt with the race between he current round and the previous
> >> round dirty recording.
> >>> (the race isn't brought by this feature essentially)
> >>
> >> It is guaranteed to work reliably without this feature as you only clear what
> >> *has been migrated*,
> >
> > Not "clear what has been migrated" (that skips nothing..)
> > Anyway, it's a hint used for optimization.
>
> Yes, an optimization that might easily lead to data corruption when the
> two bitmaps are either not in place or don't play along in that specific
> way (and I suspect this is the case under TCG).
So I checked and TCG has two copies too.
Each block has bmap used for migration and also dirty_memory
where pages are marked dirty. See cpu_physical_memory_sync_dirty_bitmap.
So from QEMU POV, there is a callback that tells balloon when it's safe
to request hints. As that affects the bitmap, that must not happen in
parallel with dirty bitmap handling. Sounds like a reasonable
limitation.
The hint can be useful outside migration, but in its current form
needs to then be non-destructive.
E.g. I can imaging userspace calling MADV_SOFT_OFFLINE on the hinted
memory.
Again a flag that tells guest it should wait until used
could be a reasonable expension. If we stick to the shrinker
it's actually implementable easily. With an OOM notifier - I'm not so
sure ...
And a big part of the problem is that after all this time the page
hinting interfaces are still undocumented. Quite sad really :(
> --
> Thanks,
>
> David / dhildenb
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Wang, Wei W" <wei.w.wang@intel.com>,
Nadav Amit <namit@vmware.com>,
Alexander Duyck <alexander.h.duyck@linux.intel.com>,
Tyler Sanderson <tysand@google.com>,
"virtualization@lists.linux-foundation.org"
<virtualization@lists.linux-foundation.org>,
David Rientjes <rientjes@google.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Michal Hocko <mhocko@kernel.org>
Subject: Re: Balloon pressuring page cache
Date: Wed, 5 Feb 2020 05:25:12 -0500 [thread overview]
Message-ID: <20200205051404-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <ef1dfd46-49c8-8aa9-2a5e-d2ebb2e093f5@redhat.com>
On Wed, Feb 05, 2020 at 10:58:14AM +0100, David Hildenbrand wrote:
> On 05.02.20 10:49, Wang, Wei W wrote:
> > On Wednesday, February 5, 2020 5:37 PM, David Hildenbrand wrote:
> >>>
> >>> Not sure how TCG tracks the dirty bits. But In whatever
> >>> implementation, the hypervisor should have
> >>
> >> There is only a single bitmap for that purpose. (well, the one where KVM
> >> syncs to)
> >>
> >>> already dealt with the race between he current round and the previous
> >> round dirty recording.
> >>> (the race isn't brought by this feature essentially)
> >>
> >> It is guaranteed to work reliably without this feature as you only clear what
> >> *has been migrated*,
> >
> > Not "clear what has been migrated" (that skips nothing..)
> > Anyway, it's a hint used for optimization.
>
> Yes, an optimization that might easily lead to data corruption when the
> two bitmaps are either not in place or don't play along in that specific
> way (and I suspect this is the case under TCG).
So I checked and TCG has two copies too.
Each block has bmap used for migration and also dirty_memory
where pages are marked dirty. See cpu_physical_memory_sync_dirty_bitmap.
So from QEMU POV, there is a callback that tells balloon when it's safe
to request hints. As that affects the bitmap, that must not happen in
parallel with dirty bitmap handling. Sounds like a reasonable
limitation.
The hint can be useful outside migration, but in its current form
needs to then be non-destructive.
E.g. I can imaging userspace calling MADV_SOFT_OFFLINE on the hinted
memory.
Again a flag that tells guest it should wait until used
could be a reasonable expension. If we stick to the shrinker
it's actually implementable easily. With an OOM notifier - I'm not so
sure ...
And a big part of the problem is that after all this time the page
hinting interfaces are still undocumented. Quite sad really :(
> --
> Thanks,
>
> David / dhildenb
next prev parent reply other threads:[~2020-02-05 10:25 UTC|newest]
Thread overview: 119+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-29 0:22 Balloon pressuring page cache Tyler Sanderson via Virtualization
2020-01-29 10:31 ` David Hildenbrand
2020-01-29 19:11 ` Tyler Sanderson via Virtualization
2020-01-30 15:02 ` David Hildenbrand
2020-01-30 15:02 ` David Hildenbrand
2020-01-30 15:20 ` Michael S. Tsirkin
2020-01-30 15:20 ` Michael S. Tsirkin
2020-01-30 15:23 ` David Hildenbrand
2020-01-30 15:23 ` David Hildenbrand
2020-01-30 15:31 ` Wang, Wei W
2020-01-30 15:31 ` Wang, Wei W
2020-01-30 19:59 ` Tyler Sanderson via Virtualization
2020-01-30 19:59 ` Tyler Sanderson
2020-02-03 13:11 ` Michael S. Tsirkin
2020-02-03 13:11 ` Michael S. Tsirkin
2020-02-03 16:18 ` Alexander Duyck
2020-02-03 16:34 ` David Hildenbrand
2020-02-03 16:34 ` David Hildenbrand
2020-02-03 17:03 ` Michael S. Tsirkin
2020-02-03 17:03 ` Michael S. Tsirkin
2020-02-03 20:32 ` Tyler Sanderson via Virtualization
2020-02-03 20:32 ` Tyler Sanderson
2020-02-03 21:22 ` Alexander Duyck
2020-02-03 23:16 ` Tyler Sanderson via Virtualization
2020-02-03 23:16 ` Tyler Sanderson
2020-02-04 0:10 ` Alexander Duyck
2020-02-04 5:45 ` Michael S. Tsirkin
2020-02-04 5:45 ` Michael S. Tsirkin
2020-02-04 8:29 ` David Hildenbrand
2020-02-04 8:29 ` David Hildenbrand
2020-02-04 18:52 ` Tyler Sanderson via Virtualization
2020-02-04 18:52 ` Tyler Sanderson
2020-02-04 18:56 ` Michael S. Tsirkin
2020-02-04 18:56 ` Michael S. Tsirkin
2020-02-04 19:17 ` David Hildenbrand
2020-02-04 19:17 ` David Hildenbrand
2020-02-04 23:58 ` Tyler Sanderson via Virtualization
2020-02-04 23:58 ` Tyler Sanderson
2020-02-05 0:15 ` Tyler Sanderson via Virtualization
2020-02-05 0:15 ` Tyler Sanderson
2020-02-05 6:57 ` Michael S. Tsirkin
2020-02-05 6:57 ` Michael S. Tsirkin
2020-02-05 19:01 ` Tyler Sanderson via Virtualization
2020-02-05 19:01 ` Tyler Sanderson
2020-02-05 19:22 ` Alexander Duyck
2020-02-05 21:44 ` Tyler Sanderson via Virtualization
2020-02-05 21:44 ` Tyler Sanderson
2020-02-06 11:00 ` David Hildenbrand
2020-02-06 11:00 ` David Hildenbrand
2020-02-03 22:50 ` Nadav Amit via Virtualization
2020-02-03 22:50 ` Nadav Amit
2020-02-04 8:35 ` David Hildenbrand
2020-02-04 8:35 ` David Hildenbrand
2020-02-04 8:40 ` Michael S. Tsirkin
2020-02-04 8:40 ` Michael S. Tsirkin
2020-02-04 8:48 ` David Hildenbrand
2020-02-04 8:48 ` David Hildenbrand
2020-02-04 14:30 ` David Hildenbrand
2020-02-04 14:30 ` David Hildenbrand
2020-02-04 16:50 ` Michael S. Tsirkin
2020-02-04 16:50 ` Michael S. Tsirkin
2020-02-04 16:56 ` David Hildenbrand
2020-02-04 16:56 ` David Hildenbrand
2020-02-04 20:33 ` [virtio-dev] " Michael S. Tsirkin
2020-02-04 20:33 ` Michael S. Tsirkin
2020-02-04 20:33 ` Michael S. Tsirkin
2020-02-05 8:31 ` [virtio-dev] " David Hildenbrand
2020-02-05 8:31 ` David Hildenbrand
2020-02-05 6:52 ` Wang, Wei W
2020-02-05 6:52 ` Wang, Wei W
2020-02-05 7:05 ` Michael S. Tsirkin
2020-02-05 7:05 ` Michael S. Tsirkin
2020-02-05 8:50 ` Wang, Wei W
2020-02-05 8:50 ` Wang, Wei W
2020-02-05 6:49 ` Wang, Wei W
2020-02-05 6:49 ` Wang, Wei W
2020-02-05 8:19 ` David Hildenbrand
2020-02-05 8:19 ` David Hildenbrand
2020-02-05 8:54 ` Wang, Wei W
2020-02-05 8:54 ` Wang, Wei W
2020-02-05 8:56 ` David Hildenbrand
2020-02-05 8:56 ` David Hildenbrand
2020-02-05 9:00 ` Wang, Wei W
2020-02-05 9:00 ` Wang, Wei W
2020-02-05 9:05 ` David Hildenbrand
2020-02-05 9:05 ` David Hildenbrand
2020-02-05 9:19 ` Wang, Wei W
2020-02-05 9:19 ` Wang, Wei W
2020-02-05 9:22 ` David Hildenbrand
2020-02-05 9:22 ` David Hildenbrand
2020-02-05 9:35 ` Wang, Wei W
2020-02-05 9:35 ` Wang, Wei W
2020-02-05 9:37 ` David Hildenbrand
2020-02-05 9:37 ` David Hildenbrand
2020-02-05 9:49 ` Wang, Wei W
2020-02-05 9:49 ` Wang, Wei W
2020-02-05 9:58 ` David Hildenbrand
2020-02-05 9:58 ` David Hildenbrand
2020-02-05 10:25 ` Michael S. Tsirkin [this message]
2020-02-05 10:25 ` Michael S. Tsirkin
2020-02-05 10:42 ` David Hildenbrand
2020-02-05 10:42 ` David Hildenbrand
2020-02-05 9:35 ` Michael S. Tsirkin
2020-02-05 9:35 ` Michael S. Tsirkin
2020-02-05 18:43 ` Tyler Sanderson via Virtualization
2020-02-05 18:43 ` Tyler Sanderson
2020-02-06 9:30 ` Wang, Wei W
2020-02-06 9:30 ` Wang, Wei W
2020-02-05 7:35 ` Nadav Amit via Virtualization
2020-02-05 7:35 ` Nadav Amit
2020-02-05 8:19 ` David Hildenbrand
2020-02-05 8:19 ` David Hildenbrand
2020-02-05 10:27 ` Michael S. Tsirkin
2020-02-05 10:27 ` Michael S. Tsirkin
2020-02-05 10:43 ` David Hildenbrand
2020-02-05 10:43 ` David Hildenbrand
2020-01-30 22:46 ` Tyler Sanderson via Virtualization
2020-01-30 22:46 ` Tyler Sanderson
2020-02-02 0:21 ` David Rientjes via Virtualization
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200205051404-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=alexander.h.duyck@linux.intel.com \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=namit@vmware.com \
--cc=rientjes@google.com \
--cc=tysand@google.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.