From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: David Hildenbrand <david@redhat.com>,
Nitesh Narayan Lal <nitesh@redhat.com>,
kvm list <kvm@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com,
Yang Zhang <yang.zhang.wz@gmail.com>,
Rik van Riel <riel@surriel.com>,
dodgen@google.com, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
dhildenb@redhat.com, Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting
Date: Tue, 19 Feb 2019 17:17:06 -0500 [thread overview]
Message-ID: <20190219170446-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAKgT0Uc5CXZLUAqR8G4=aMmsfu2SMHfvGUjwPBmDCPT839Q-rg@mail.gmail.com>
On Tue, Feb 19, 2019 at 01:57:14PM -0800, Alexander Duyck wrote:
> On Tue, Feb 19, 2019 at 10:32 AM David Hildenbrand <david@redhat.com> wrote:
> >
> > >>> This essentially just ends up being another trade-off of CPU versus
> > >>> memory though. Assuming we aren't using THP we are going to take a
> > >>> penalty in terms of performance but could then free individual pages
> > >>> less than HUGETLB_PAGE_ORDER, but the CPU utilization is going to be
> > >>> much higher in general even without the hinting. I figure for x86 we
> > >>> probably don't have too many options since if I am not mistaken
> > >>> MAX_ORDER is just one or two more than HUGETLB_PAGE_ORDER.
> > >>
> > >> THP is an implementation detail in the hypervisor. Yes, it is the common
> > >> case on x86. But it is e.g. not available on s390x yet. And we also want
> > >> this mechanism to work on s390x (e.g. for nested virtualization setups
> > >> as discussed).
> > >>
> > >> If we e.g. report any granularity after merging was done in the buddy,
> > >> we could end up reporting everything from page size up to MAX_SIZE - 1,
> > >> the hypervisor could ignore hints below a certain magic number, if it
> > >> makes its life easier.
> > >
> > > For each architecture we can do a separate implementation of what to
> > > hint on. We already do that for bare metal so why would we have guests
> > > do the same type of hinting in the virtualization case when there are
> > > fundamental differences in page size and features in each
> > > architecture?
> > >
> > > This is another reason why I think the hypercall approach is a better
> > > idea since each architecture is likely going to want to handle things
> > > differently and it would be a pain to try and sort that all out in a
> > > virtio driver.
> >
> > I can't follow. We are talking about something as simple as a minimum
> > page granularity here that can easily be configured. Nothing that
> > screams for different implementations. But I get your point, we could
> > tune for different architectures.
>
> I was thinking about the guest side of things. Basically if we need to
> define different page orders for different architectures then we start
> needing to do architecture specific includes. Then if we throw in
> stuff like the fact that the first level of KVM can make use of the
> host style hints then that is another thing that will be a difference
> int he different architectures.
Sorry didn't catch this one. What are host style hints?
> I'm just worried this stuff is going
> to start adding up to a bunch of "#ifdef" cruft if we are trying to do
> this as a virtio driver.
I agree we want to avoid that.
And by comparison, if it's up to host or if it's tied to logic within
guest (such as MAX_PAGE_ORDER as suggested by Linus) as opposed to CPU
architecture, then virtio is easier as you can re-use config space and
feature bits to negotiate host/guest capabilities. Doing hypercalls for
that would add lots of hypercalls.
I CC'd Wei Wang who implemented host-driven hints in the balloon right
now. Wei I wonder - could you try changing from MAX_PAGE_ORDER to
HUGETLB_PAGE_ORDER? Does this affect performance for you at all? Thanks!
> > >
> > >>>
> > >>> As far as fragmentation my thought is that we may want to look into
> > >>> adding support to the guest for prioritizing defragmentation on pages
> > >>> lower than THP size. Then that way we could maintain the higher
> > >>> overall performance with or without the hinting since shuffling lower
> > >>> order pages around between guests would start to get expensive pretty
> > >>> quick.
> > >>
> > >> My take would be, design an interface/mechanism that allows any kind of
> > >> granularity. You can than balance between cpu overead and space shifting.
> > >
> > > The problem with using "any kind of granularity" is that in the case
> > > of memory we are already having problems with 4K pages being deemed
> > > too small of a granularity to be useful for anything and making
> > > operations too expensive.
> >
> > No, sorry, s390x does it. And via batch reporting it could work. Not
> > saying we should do page granularity, but "to be useful for anything" is
> > just wrong.
>
> Yeah, I was engaging in a bit of hyperbole. I have had a headache this
> morning so I am a bit cranky.
>
> So I am assuming the batching is the reason why you also have a
> arch_alloc_page then for the s390 so that you can abort the hint if a
> page is reallocated before the hint is processed then? I just want to
> confirm so that my understanding of this is correct.
>
> If that is the case I would be much happier with an asynchronous page
> hint setup as this doesn't deprive the guest of memory while waiting
> on the hint. The current logic in the patches from Nitesh has the
> pages unavailable to the guest while waiting on the hint and that has
> me somewhat concerned as it is going to hurt cache locality as it will
> guarantee that we cannot reuse the same page if we are doing a cycle
> of alloc and free for the same page size.
> > >
> > > I'm open to using other page orders for other architectures. Nothing
> > > says we have to stick with THP sized pages for all architectures. I
> > > have just been focused on x86 and this seems like the best fit for the
> > > balance between CPU and freeing of memory for now on that
> > > architecture.
> > >
> > >> I feel like repeating myself, but on s390x hinting is done on page
> > >> granularity, and I have never heard somebody say "how can I turn it off,
> > >> this is slowing down my system too much.". All we know is that one
> > >> hypercall per free is most probably not acceptable. We really have to
> > >> play with the numbers.
> > >
> > > My thought was we could look at doing different implementations for
> > > other architectures such as s390 and powerPC. Odds are the
> > > implementations would be similar but have slight differences where
> > > appropriate such as what order we should start hinting on, or if we
> > > bypass the hypercall/virtio-balloon for a host native approach if
> > > available.
> > >
> > >> I tend to like an asynchronous reporting approach as discussed in this
> > >> thread, we would have to see if Nitesh could get it implemented.
> > >
> > > I agree it would be great if it could work. However I have concerns
> > > given that work on this patch set dates back to 2017, major issues
> > > such as working around device assignment have yet to be addressed, and
> > > it seems like most of the effort is being focused on things that in my
> > > opinion are being over-engineered for little to no benefit.
> >
> > I can understand that you are trying to push your solution. I would do
> > the same. Again, I don't like a pure synchronous approach that works on
> > one-element-at-a-time. Period. Other people might have other opinions.
> > This is mine - luckily I don't have anything to say here :)
> >
> > MST also voted for an asynchronous solution if we can make it work.
> > Nitesh made significant improvements since the 2017. Complicated stuff
> > needs time. No need to rush. People have been talking about free page
> > hinting since 2006. I talked to various people that experimented with
> > bitmap based solutions two years ago.
>
> Now that I think I have a better understanding of how the s390x is
> handling this I'm beginning to come around to the idea of an
> asynchronous setup. The one thing that has been bugging me about the
> asynchronous approach is the fact that the pages are not available to
> the guest while waiting on the hint to be completed. If we can do
> something like an arch_alloc_page and that would abort the hint and
> allow us to keep the page available while waiting on the hint that
> would be my preferred way of handling this.
>
> > So much to that, if you think your solution is the way to go, please
> > follow up on it. Nitesh seems to have decided to look into the
> > asynchronous approach you also called "great if it could work". As long
> > as we don't run into elementary blockers there, to me it all looks like
> > we are making progress, which is good. If we find out asynchronous
> > doesn't work, synchronous is the only alternative.
>
> I plan to follow up in the next week or so.
>
> > And just so you don't get me wrong: Thanks for looking and working on
> > this. And thanks for sharing your opinions and insights! However making
> > a decision about going your way at this point does not seem reasonable
> > to me. We have plenty of time.
>
> I appreciate the feedback. Sorry if I seemed a bit short. As I
> mentioned I've had a headache most of the morning which hasn't really
> helped my mood.
>
> Thanks.
>
> - Alex
--
MST
next prev parent reply other threads:[~2019-02-19 22:17 UTC|newest]
Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-04 20:18 [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting Nitesh Narayan Lal
2019-02-04 20:18 ` [RFC][Patch v8 1/7] KVM: Support for guest free page hinting Nitesh Narayan Lal
2019-02-05 4:14 ` Michael S. Tsirkin
2019-02-05 13:06 ` Nitesh Narayan Lal
2019-02-05 16:27 ` Michael S. Tsirkin
2019-02-05 16:34 ` Nitesh Narayan Lal
2019-02-04 20:18 ` [RFC][Patch v8 2/7] KVM: Enabling guest free page hinting via static key Nitesh Narayan Lal
2019-02-08 18:07 ` Alexander Duyck
2019-02-08 18:22 ` Nitesh Narayan Lal
2019-02-04 20:18 ` [RFC][Patch v8 3/7] KVM: Guest free page hinting functional skeleton Nitesh Narayan Lal
2019-02-04 20:18 ` [RFC][Patch v8 4/7] KVM: Disabling page poisoning to prevent corruption Nitesh Narayan Lal
2019-02-07 17:23 ` Alexander Duyck
2019-02-07 17:56 ` Nitesh Narayan Lal
2019-02-07 18:24 ` Alexander Duyck
2019-02-07 19:14 ` Michael S. Tsirkin
2019-02-07 21:08 ` Michael S. Tsirkin
2019-02-04 20:18 ` [RFC][Patch v8 5/7] virtio: Enables to add a single descriptor to the host Nitesh Narayan Lal
2019-02-05 20:49 ` Michael S. Tsirkin
2019-02-06 12:56 ` Nitesh Narayan Lal
2019-02-06 13:15 ` Luiz Capitulino
2019-02-06 13:24 ` Nitesh Narayan Lal
2019-02-06 13:29 ` Luiz Capitulino
2019-02-06 14:05 ` Nitesh Narayan Lal
2019-02-06 18:03 ` Michael S. Tsirkin
2019-02-06 18:19 ` Nitesh Narayan Lal
2019-02-04 20:18 ` [RFC][Patch v8 6/7] KVM: Enables the kernel to isolate and report free pages Nitesh Narayan Lal
2019-02-05 20:45 ` Michael S. Tsirkin
2019-02-05 21:54 ` Nitesh Narayan Lal
2019-02-05 21:55 ` Michael S. Tsirkin
2019-02-07 17:43 ` Alexander Duyck
2019-02-07 19:01 ` Michael S. Tsirkin
2019-02-07 20:50 ` Nitesh Narayan Lal
2019-02-08 17:58 ` Alexander Duyck
2019-02-08 20:41 ` Nitesh Narayan Lal
2019-02-08 21:38 ` Michael S. Tsirkin
2019-02-08 22:05 ` Alexander Duyck
2019-02-10 0:38 ` Michael S. Tsirkin
2019-02-11 9:28 ` David Hildenbrand
2019-02-12 5:16 ` Michael S. Tsirkin
2019-02-12 17:10 ` Nitesh Narayan Lal
2019-02-08 21:35 ` Michael S. Tsirkin
2019-02-04 20:18 ` [RFC][Patch v8 7/7] KVM: Adding tracepoints for guest page hinting Nitesh Narayan Lal
2019-02-04 20:20 ` [RFC][QEMU PATCH] KVM: Support for guest free " Nitesh Narayan Lal
2019-02-12 9:03 ` [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting Wang, Wei W
2019-02-12 9:24 ` David Hildenbrand
2019-02-12 17:24 ` Nitesh Narayan Lal
2019-02-12 19:34 ` David Hildenbrand
2019-02-13 8:55 ` Wang, Wei W
2019-02-13 9:19 ` David Hildenbrand
2019-02-13 12:17 ` Nitesh Narayan Lal
2019-02-13 17:09 ` Michael S. Tsirkin
2019-02-13 17:22 ` Nitesh Narayan Lal
[not found] ` <286AC319A985734F985F78AFA26841F73DF6F1C3@shsmsx102.ccr.corp.intel.com>
2019-02-14 9:34 ` David Hildenbrand
2019-02-13 17:16 ` Michael S. Tsirkin
2019-02-13 17:59 ` David Hildenbrand
2019-02-13 19:08 ` Michael S. Tsirkin
2019-02-14 9:08 ` Wang, Wei W
2019-02-14 10:00 ` David Hildenbrand
2019-02-14 10:44 ` David Hildenbrand
2019-02-15 9:15 ` Wang, Wei W
2019-02-15 9:33 ` David Hildenbrand
2019-02-13 9:00 ` Wang, Wei W
2019-02-13 12:06 ` Nitesh Narayan Lal
2019-02-14 8:48 ` Wang, Wei W
2019-02-14 9:42 ` David Hildenbrand
2019-02-15 9:05 ` Wang, Wei W
2019-02-15 9:41 ` David Hildenbrand
2019-02-18 2:36 ` Wei Wang
2019-02-18 2:39 ` Wei Wang
2019-02-15 12:40 ` Nitesh Narayan Lal
2019-02-14 13:00 ` Nitesh Narayan Lal
2019-02-16 9:40 ` David Hildenbrand
2019-02-18 15:50 ` Nitesh Narayan Lal
2019-02-18 16:02 ` David Hildenbrand
2019-02-18 16:49 ` Michael S. Tsirkin
2019-02-18 16:59 ` David Hildenbrand
2019-02-18 17:31 ` Alexander Duyck
2019-02-18 17:41 ` David Hildenbrand
2019-02-18 23:47 ` Alexander Duyck
2019-02-19 2:45 ` Michael S. Tsirkin
2019-02-19 2:46 ` Andrea Arcangeli
2019-02-19 12:52 ` Nitesh Narayan Lal
2019-02-19 16:23 ` Alexander Duyck
2019-02-19 8:06 ` David Hildenbrand
2019-02-19 14:40 ` Michael S. Tsirkin
2019-02-19 14:44 ` David Hildenbrand
2019-02-19 14:45 ` David Hildenbrand
2019-02-18 18:01 ` Michael S. Tsirkin
2019-02-18 17:54 ` Michael S. Tsirkin
2019-02-18 18:29 ` David Hildenbrand
2019-02-18 19:16 ` Michael S. Tsirkin
2019-02-18 19:35 ` David Hildenbrand
2019-02-18 19:47 ` Michael S. Tsirkin
2019-02-18 20:04 ` David Hildenbrand
2019-02-18 20:31 ` Michael S. Tsirkin
2019-02-18 20:40 ` Nitesh Narayan Lal
2019-02-18 21:04 ` David Hildenbrand
2019-02-19 0:01 ` Alexander Duyck
2019-02-19 7:54 ` David Hildenbrand
2019-02-19 18:06 ` Alexander Duyck
2019-02-19 18:31 ` David Hildenbrand
2019-02-19 21:57 ` Alexander Duyck
2019-02-19 22:17 ` Michael S. Tsirkin [this message]
2019-02-19 22:36 ` David Hildenbrand
2019-02-19 19:58 ` Michael S. Tsirkin
2019-02-19 20:02 ` David Hildenbrand
2019-02-19 20:17 ` Michael S. Tsirkin
2019-02-19 20:21 ` David Hildenbrand
2019-02-19 20:35 ` Michael S. Tsirkin
2019-02-19 12:47 ` Nitesh Narayan Lal
2019-02-19 13:03 ` David Hildenbrand
2019-02-19 14:17 ` Nitesh Narayan Lal
2019-02-19 14:21 ` David Hildenbrand
2019-02-18 20:53 ` David Hildenbrand
2019-02-23 0:02 ` Alexander Duyck
2019-02-25 13:01 ` Nitesh Narayan Lal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190219170446-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=aarcange@redhat.com \
--cc=alexander.duyck@gmail.com \
--cc=david@redhat.com \
--cc=dhildenb@redhat.com \
--cc=dodgen@google.com \
--cc=konrad.wilk@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=lcapitulino@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nitesh@redhat.com \
--cc=pagupta@redhat.com \
--cc=pbonzini@redhat.com \
--cc=riel@surriel.com \
--cc=wei.w.wang@intel.com \
--cc=yang.zhang.wz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).