Re: [RFC for Linux] virtio_balloon: Add VIRTIO_BALLOON_F_THP_ORDER to handle THP spilt issue

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Hui Zhu <teawater@gmail.com>,
	jasowang@redhat.com, akpm@linux-foundation.org,
	pagupta@redhat.com, mojha@codeaurora.org, namit@vmware.com,
	virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	Hui Zhu <teawaterz@linux.alibaba.com>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>
Subject: Re: [RFC for Linux] virtio_balloon: Add VIRTIO_BALLOON_F_THP_ORDER to handle THP spilt issue
Date: Tue, 31 Mar 2020 11:28:41 -0400	[thread overview]
Message-ID: <20200331112730-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <00dc8bad-05e5-6085-525c-ce9fded672cc@redhat.com>

On Tue, Mar 31, 2020 at 04:34:48PM +0200, David Hildenbrand wrote:
> On 31.03.20 16:29, David Hildenbrand wrote:
> > On 31.03.20 16:18, Michael S. Tsirkin wrote:
> >> On Tue, Mar 31, 2020 at 04:09:59PM +0200, David Hildenbrand wrote:
> >>
> >> ...
> >>
> >>>>>>>>>>>>>> So if we want to address this, IMHO this calls for a new API.
> >>>>>>>>>>>>>> Along the lines of
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>    struct page *alloc_page_range(gfp_t gfp, unsigned int min_order,
> >>>>>>>>>>>>>>                    unsigned int max_order, unsigned int *order)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> the idea would then be to return at a number of pages in the given
> >>>>>>>>>>>>>> range.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What do you think? Want to try implementing that?
> >>
> >> ..
> >>
> >>> I expect the whole "steal huge pages from your guest" to be problematic,
> >>> as I already mentioned to Alex. This needs a performance evaluation.
> >>>
> >>> This all smells like a lot of workload dependent fine-tuning. :)
> >>
> >>
> >> So that's why I proposed the API above.
> >>
> >> The idea is that *if we are allocating a huge page anyway*,
> >> rather than break it up let's send it whole to the device.
> >> If we have smaller pages, return smaller pages.
> >>
> > 
> > Sorry, I still fail to see why you cannot do that with my version of
> > balloon_pages_alloc(). But maybe I haven't understood the magic you
> > expect to happen in alloc_page_range() :)
> > 
> > It's just going via a different inflate queue once we have that page, as
> > I stated in front of my draft patch "but with an
> > optimized reporting interface".
> > 
> >> That seems like it would always be an improvement, whatever the
> >> workload.
> >>
> > 
> > Don't think so. Assume there are plenty of 4k pages lying around. It
> > might actually be *bad* for guest performance if you take a huge page
> > instead of all the leftover 4k pages that cannot be merged. Only at the
> > point where you would want to break a bigger page up and report it in
> > pieces, where it would definitely make no difference.
> 
> I just understood what you mean :) and now it makes sense - it avoids
> exactly that. Basically
> 
> 1. Try to allocate order-0. No split necessary? return the page
> 2. Try to allocate order-1. No split necessary? return the page
> ...
> 
> up to MAX_ORDER - 1.
> 
> Yeah, I guess this will need a new kernel API.

Exactly what I meant. And whever we fail and block for reclaim, we
restart this.

> 
> -- 
> Thanks,
> 
> David / dhildenb

WARNING: multiple messages have this Message-ID (diff)

From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: pagupta@redhat.com,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	qemu-devel@nongnu.org, mojha@codeaurora.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, namit@vmware.com,
	Hui Zhu <teawaterz@linux.alibaba.com>,
	akpm@linux-foundation.org, jasowang@redhat.com,
	Hui Zhu <teawater@gmail.com>
Subject: Re: [RFC for Linux] virtio_balloon: Add VIRTIO_BALLOON_F_THP_ORDER to handle THP spilt issue
Date: Tue, 31 Mar 2020 11:28:41 -0400	[thread overview]
Message-ID: <20200331112730-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <00dc8bad-05e5-6085-525c-ce9fded672cc@redhat.com>

On Tue, Mar 31, 2020 at 04:34:48PM +0200, David Hildenbrand wrote:
> On 31.03.20 16:29, David Hildenbrand wrote:
> > On 31.03.20 16:18, Michael S. Tsirkin wrote:
> >> On Tue, Mar 31, 2020 at 04:09:59PM +0200, David Hildenbrand wrote:
> >>
> >> ...
> >>
> >>>>>>>>>>>>>> So if we want to address this, IMHO this calls for a new API.
> >>>>>>>>>>>>>> Along the lines of
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>    struct page *alloc_page_range(gfp_t gfp, unsigned int min_order,
> >>>>>>>>>>>>>>                    unsigned int max_order, unsigned int *order)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> the idea would then be to return at a number of pages in the given
> >>>>>>>>>>>>>> range.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What do you think? Want to try implementing that?
> >>
> >> ..
> >>
> >>> I expect the whole "steal huge pages from your guest" to be problematic,
> >>> as I already mentioned to Alex. This needs a performance evaluation.
> >>>
> >>> This all smells like a lot of workload dependent fine-tuning. :)
> >>
> >>
> >> So that's why I proposed the API above.
> >>
> >> The idea is that *if we are allocating a huge page anyway*,
> >> rather than break it up let's send it whole to the device.
> >> If we have smaller pages, return smaller pages.
> >>
> > 
> > Sorry, I still fail to see why you cannot do that with my version of
> > balloon_pages_alloc(). But maybe I haven't understood the magic you
> > expect to happen in alloc_page_range() :)
> > 
> > It's just going via a different inflate queue once we have that page, as
> > I stated in front of my draft patch "but with an
> > optimized reporting interface".
> > 
> >> That seems like it would always be an improvement, whatever the
> >> workload.
> >>
> > 
> > Don't think so. Assume there are plenty of 4k pages lying around. It
> > might actually be *bad* for guest performance if you take a huge page
> > instead of all the leftover 4k pages that cannot be merged. Only at the
> > point where you would want to break a bigger page up and report it in
> > pieces, where it would definitely make no difference.
> 
> I just understood what you mean :) and now it makes sense - it avoids
> exactly that. Basically
> 
> 1. Try to allocate order-0. No split necessary? return the page
> 2. Try to allocate order-1. No split necessary? return the page
> ...
> 
> up to MAX_ORDER - 1.
> 
> Yeah, I guess this will need a new kernel API.

Exactly what I meant. And whever we fail and block for reclaim, we
restart this.

> 
> -- 
> Thanks,
> 
> David / dhildenb

next prev parent reply	other threads:[~2020-03-31 15:28 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-12  7:49 [RFC for Linux] virtio_balloon: Add VIRTIO_BALLOON_F_THP_ORDER to handle THP spilt issue Hui Zhu
2020-03-12  7:49 ` Hui Zhu
2020-03-12  7:49 ` [RFC for QEMU] virtio-balloon: Add option thp-order to set VIRTIO_BALLOON_F_THP_ORDER Hui Zhu
2020-03-12  7:49   ` Hui Zhu
2020-03-12  8:22   ` no-reply
2020-03-12  8:22     ` no-reply
2020-03-12  8:22     ` no-reply
2020-03-12  8:25   ` Michael S. Tsirkin
2020-03-12  8:25     ` Michael S. Tsirkin
2020-03-17 10:13     ` teawater
2020-03-17 10:13       ` teawater
2020-03-26  7:07       ` Michael S. Tsirkin
2020-03-26  7:07         ` Michael S. Tsirkin
2020-03-12  8:18 ` [RFC for Linux] virtio_balloon: Add VIRTIO_BALLOON_F_THP_ORDER to handle THP spilt issue Michael S. Tsirkin
2020-03-12  8:18   ` Michael S. Tsirkin
2020-03-12  8:37 ` David Hildenbrand
2020-03-12  8:37   ` David Hildenbrand
2020-03-12  8:47   ` Michael S. Tsirkin
2020-03-12  8:47     ` Michael S. Tsirkin
2020-03-12  8:51     ` David Hildenbrand
2020-03-12  8:51       ` David Hildenbrand
2020-03-26  7:10       ` Michael S. Tsirkin
2020-03-26  7:10         ` Michael S. Tsirkin
2020-03-26  7:20       ` Michael S. Tsirkin
2020-03-26  7:20         ` Michael S. Tsirkin
2020-03-26  7:54         ` David Hildenbrand
2020-03-26  7:54           ` David Hildenbrand
2020-03-26  9:49           ` Michael S. Tsirkin
2020-03-26  9:49             ` Michael S. Tsirkin
2020-03-31 10:35             ` David Hildenbrand
2020-03-31 10:35               ` David Hildenbrand
2020-03-31 13:24               ` Michael S. Tsirkin
2020-03-31 13:24                 ` Michael S. Tsirkin
2020-03-31 13:32                 ` David Hildenbrand
2020-03-31 13:32                   ` David Hildenbrand
2020-03-31 13:37                   ` Michael S. Tsirkin
2020-03-31 13:37                     ` Michael S. Tsirkin
2020-03-31 14:03                     ` David Hildenbrand
2020-03-31 14:03                       ` David Hildenbrand
2020-03-31 14:07                       ` Michael S. Tsirkin
2020-03-31 14:07                         ` Michael S. Tsirkin
2020-03-31 14:09                         ` David Hildenbrand
2020-03-31 14:09                           ` David Hildenbrand
2020-03-31 14:18                           ` Michael S. Tsirkin
2020-03-31 14:18                             ` Michael S. Tsirkin
2020-03-31 14:29                             ` David Hildenbrand
2020-03-31 14:29                               ` David Hildenbrand
2020-03-31 14:29                               ` David Hildenbrand
2020-03-31 14:34                               ` David Hildenbrand
2020-03-31 14:34                                 ` David Hildenbrand
2020-03-31 15:28                                 ` Michael S. Tsirkin [this message]
2020-03-31 15:28                                   ` Michael S. Tsirkin
2020-03-31 16:37                           ` Nadav Amit
2020-04-01  9:48                             ` David Hildenbrand
2020-04-01  9:48                               ` David Hildenbrand
2020-04-01  9:48                               ` David Hildenbrand
2020-04-02  4:02                               ` teawater
2020-04-02  4:02                                 ` teawater
2020-04-02  8:00                         ` teawater
2020-04-02  8:00                           ` teawater
2020-04-02 12:37                           ` Michael S. Tsirkin
2020-04-02 12:37                             ` Michael S. Tsirkin
2020-03-31 16:27                   ` Nadav Amit
2020-04-01 11:21                     ` David Hildenbrand
2020-04-01 11:21                       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200331112730-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=david@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mojha@codeaurora.org \
    --cc=namit@vmware.com \
    --cc=pagupta@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=teawater@gmail.com \
    --cc=teawaterz@linux.alibaba.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.