All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Li, Liang Z" <liang.z.li@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"amit.shah@redhat.com" <amit.shah@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH RFC kernel] balloon: speed up inflating/deflating process
Date: Tue, 24 May 2016 14:11:35 +0300	[thread overview]
Message-ID: <20160524111135.GA7392@redhat.com> (raw)
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E041A4D21@shsmsx102.ccr.corp.intel.com>

On Tue, May 24, 2016 at 10:38:43AM +0000, Li, Liang Z wrote:
> > > > >  {
> > > > > -	struct scatterlist sg;
> > > > >  	unsigned int len;
> > > > >
> > > > > -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > > > +	if (virtio_has_feature(vb->vdev,
> > > > VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > > > > +		u32 page_shift = PAGE_SHIFT;
> > > > > +		unsigned long start_pfn, end_pfn, flags = 0, bmap_len;
> > > > > +		struct scatterlist sg[5];
> > > > > +
> > > > > +		start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> > > > > +		end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> > > > > +		bmap_len = (end_pfn - start_pfn) / BITS_PER_LONG *
> > > > sizeof(long);
> > > > > +
> > > > > +		sg_init_table(sg, 5);
> > > > > +		sg_set_buf(&sg[0], &flags, sizeof(flags));
> > > > > +		sg_set_buf(&sg[1], &start_pfn, sizeof(start_pfn));
> > > > > +		sg_set_buf(&sg[2], &page_shift, sizeof(page_shift));
> > > > > +		sg_set_buf(&sg[3], &bmap_len, sizeof(bmap_len));
> > > > > +		sg_set_buf(&sg[4], vb->page_bitmap +
> > > > > +				 (start_pfn / BITS_PER_LONG), bmap_len);
> > > >
> > > > This can be pre-initialized, correct?
> > >
> > > pre-initialized? I am not quite understand your mean.
> > 
> > I think you can maintain sg as part of device state and init sg with the bitmap.
> > 
> 
> I got it.
> 
> > > > This is grossly inefficient if you only requested a single page.
> > > > And it's also allocating memory very aggressively without ever
> > > > telling the host what is going on.
> > >
> > > If only requested a single page, there is no need  to send the entire
> > > page bitmap, This RFC patch has already considered about this.
> > 
> > where's that addressed in code?
> > 
> 
> By record the start_pfn and end_pfn.
> 
> The start_pfn & end_pfn will be updated in set_page_bitmap()
> and will be used in the function tell_host():
> 
> ---------------------------------------------------------------------------------
> +static void set_page_bitmap(struct virtio_balloon *vb, struct page 
> +*page) {
> +	unsigned int i;
> +	unsigned long *bitmap = vb->page_bitmap;
> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +	for (i = 0; i < VIRTIO_BALLOON_PAGES_PER_PAGE; i++)
> +		set_bit(balloon_pfn + i, bitmap);

BTW, there's a page size value in header so there
is no longer need to set multiple bits per page.

> +	if (balloon_pfn < vb->start_pfn)
> +		vb->start_pfn = balloon_pfn;
> +	if (balloon_pfn > vb->end_pfn)
> +		vb->end_pfn = balloon_pfn;
> +}

Sounds good, but you also need to limit by
allocated bitmap size.

> 
> +		unsigned long start_pfn, end_pfn, flags = 0, bmap_len;
> +		struct scatterlist sg[5];
> +
> +		start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> +		end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> +		bmap_len = (end_pfn - start_pfn) / BITS_PER_LONG * sizeof(long);
> +
> +		sg_init_table(sg, 5);
> +		sg_set_buf(&sg[0], &flags, sizeof(flags));
> +		sg_set_buf(&sg[1], &start_pfn, sizeof(start_pfn));
> +		sg_set_buf(&sg[2], &page_shift, sizeof(page_shift));
> +		sg_set_buf(&sg[3], &bmap_len, sizeof(bmap_len));
> +		sg_set_buf(&sg[4], vb->page_bitmap +
> +				 (start_pfn / BITS_PER_LONG), bmap_len);

Looks wrong. start_pfn should start at offset 0 I think ...

> +		virtqueue_add_outbuf(vq, sg, 5, vb, GFP_KERNEL);
> -------------------------------------------------------------------------------------------
> > > But it can works very well if requesting several pages  which across a
> > > large range.
> > 
> > Some kind of limit on range would make sense though.
> > It need not cover max pfn.
> > 
> 
> Yes, agree.
> 
> > > > Suggestion to address all above comments:
> > > > 	1. allocate a bunch of pages and link them up,
> > > > 	   calculating the min and the max pfn.
> > > > 	   if max-min exceeds the allocated bitmap size,
> > > > 	   tell host.
> > >
> > > I am not sure if it works well in some cases, e.g. The allocated pages
> > > are across a wide range and the max-min > limit is very frequently to be
> > true.
> > > Then, there will be many times of virtio transmission and it's bad for
> > > performance improvement. Right?
> > 
> > It's a tradeoff for sure. Measure it, see what the overhead is.
> 
> OK, I will try and get back to you.
> 
> > 
> > >
> > > > 	2. limit allocated bitmap size to something reasonable.
> > > > 	   How about 32Kbytes? This is 256kilo bit in the map, which comes
> > > > 	   out to 1Giga bytes of memory in the balloon.
> > >
> > > So, even the VM has 1TB of RAM, the page bitmap will take 32MB of
> > memory.
> > > Maybe it's better to use a big page bitmap the save the pages
> > > allocated by balloon, and split the big page bitmap to 32K bytes unit, then
> > transfer one unit at a time.
> > 
> > How is this different from what I said?
> > 
> 
> It's good if it's the same as you said.
> 
> Thanks!
> Liang
> 
> > >
> > > Should we use a page bitmap to replace 'vb->pages' ?
> > >
> > > How about rolling back to use PFNs if the count of requested pages is a
> > small number?
> > >
> > > Liang
> > 
> > That's why we have start pfn. you can use that to pass even a single page
> > without a lot of overhead.
> > 
> > > > > --
> > > > > 1.9.1
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > the body of a message to majordomo@vger.kernel.org More majordomo
> > > > info at http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Li, Liang Z" <liang.z.li@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"amit.shah@redhat.com" <amit.shah@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [Qemu-devel] [PATCH RFC kernel] balloon: speed up inflating/deflating process
Date: Tue, 24 May 2016 14:11:35 +0300	[thread overview]
Message-ID: <20160524111135.GA7392@redhat.com> (raw)
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E041A4D21@shsmsx102.ccr.corp.intel.com>

On Tue, May 24, 2016 at 10:38:43AM +0000, Li, Liang Z wrote:
> > > > >  {
> > > > > -	struct scatterlist sg;
> > > > >  	unsigned int len;
> > > > >
> > > > > -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > > > +	if (virtio_has_feature(vb->vdev,
> > > > VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > > > > +		u32 page_shift = PAGE_SHIFT;
> > > > > +		unsigned long start_pfn, end_pfn, flags = 0, bmap_len;
> > > > > +		struct scatterlist sg[5];
> > > > > +
> > > > > +		start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> > > > > +		end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> > > > > +		bmap_len = (end_pfn - start_pfn) / BITS_PER_LONG *
> > > > sizeof(long);
> > > > > +
> > > > > +		sg_init_table(sg, 5);
> > > > > +		sg_set_buf(&sg[0], &flags, sizeof(flags));
> > > > > +		sg_set_buf(&sg[1], &start_pfn, sizeof(start_pfn));
> > > > > +		sg_set_buf(&sg[2], &page_shift, sizeof(page_shift));
> > > > > +		sg_set_buf(&sg[3], &bmap_len, sizeof(bmap_len));
> > > > > +		sg_set_buf(&sg[4], vb->page_bitmap +
> > > > > +				 (start_pfn / BITS_PER_LONG), bmap_len);
> > > >
> > > > This can be pre-initialized, correct?
> > >
> > > pre-initialized? I am not quite understand your mean.
> > 
> > I think you can maintain sg as part of device state and init sg with the bitmap.
> > 
> 
> I got it.
> 
> > > > This is grossly inefficient if you only requested a single page.
> > > > And it's also allocating memory very aggressively without ever
> > > > telling the host what is going on.
> > >
> > > If only requested a single page, there is no need  to send the entire
> > > page bitmap, This RFC patch has already considered about this.
> > 
> > where's that addressed in code?
> > 
> 
> By record the start_pfn and end_pfn.
> 
> The start_pfn & end_pfn will be updated in set_page_bitmap()
> and will be used in the function tell_host():
> 
> ---------------------------------------------------------------------------------
> +static void set_page_bitmap(struct virtio_balloon *vb, struct page 
> +*page) {
> +	unsigned int i;
> +	unsigned long *bitmap = vb->page_bitmap;
> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +	for (i = 0; i < VIRTIO_BALLOON_PAGES_PER_PAGE; i++)
> +		set_bit(balloon_pfn + i, bitmap);

BTW, there's a page size value in header so there
is no longer need to set multiple bits per page.

> +	if (balloon_pfn < vb->start_pfn)
> +		vb->start_pfn = balloon_pfn;
> +	if (balloon_pfn > vb->end_pfn)
> +		vb->end_pfn = balloon_pfn;
> +}

Sounds good, but you also need to limit by
allocated bitmap size.

> 
> +		unsigned long start_pfn, end_pfn, flags = 0, bmap_len;
> +		struct scatterlist sg[5];
> +
> +		start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> +		end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> +		bmap_len = (end_pfn - start_pfn) / BITS_PER_LONG * sizeof(long);
> +
> +		sg_init_table(sg, 5);
> +		sg_set_buf(&sg[0], &flags, sizeof(flags));
> +		sg_set_buf(&sg[1], &start_pfn, sizeof(start_pfn));
> +		sg_set_buf(&sg[2], &page_shift, sizeof(page_shift));
> +		sg_set_buf(&sg[3], &bmap_len, sizeof(bmap_len));
> +		sg_set_buf(&sg[4], vb->page_bitmap +
> +				 (start_pfn / BITS_PER_LONG), bmap_len);

Looks wrong. start_pfn should start at offset 0 I think ...

> +		virtqueue_add_outbuf(vq, sg, 5, vb, GFP_KERNEL);
> -------------------------------------------------------------------------------------------
> > > But it can works very well if requesting several pages  which across a
> > > large range.
> > 
> > Some kind of limit on range would make sense though.
> > It need not cover max pfn.
> > 
> 
> Yes, agree.
> 
> > > > Suggestion to address all above comments:
> > > > 	1. allocate a bunch of pages and link them up,
> > > > 	   calculating the min and the max pfn.
> > > > 	   if max-min exceeds the allocated bitmap size,
> > > > 	   tell host.
> > >
> > > I am not sure if it works well in some cases, e.g. The allocated pages
> > > are across a wide range and the max-min > limit is very frequently to be
> > true.
> > > Then, there will be many times of virtio transmission and it's bad for
> > > performance improvement. Right?
> > 
> > It's a tradeoff for sure. Measure it, see what the overhead is.
> 
> OK, I will try and get back to you.
> 
> > 
> > >
> > > > 	2. limit allocated bitmap size to something reasonable.
> > > > 	   How about 32Kbytes? This is 256kilo bit in the map, which comes
> > > > 	   out to 1Giga bytes of memory in the balloon.
> > >
> > > So, even the VM has 1TB of RAM, the page bitmap will take 32MB of
> > memory.
> > > Maybe it's better to use a big page bitmap the save the pages
> > > allocated by balloon, and split the big page bitmap to 32K bytes unit, then
> > transfer one unit at a time.
> > 
> > How is this different from what I said?
> > 
> 
> It's good if it's the same as you said.
> 
> Thanks!
> Liang
> 
> > >
> > > Should we use a page bitmap to replace 'vb->pages' ?
> > >
> > > How about rolling back to use PFNs if the count of requested pages is a
> > small number?
> > >
> > > Liang
> > 
> > That's why we have start pfn. you can use that to pass even a single page
> > without a lot of overhead.
> > 
> > > > > --
> > > > > 1.9.1
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > the body of a message to majordomo@vger.kernel.org More majordomo
> > > > info at http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2016-05-24 11:11 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-20  9:59 [PATCH RFC kernel] balloon: speed up inflating/deflating process Liang Li
2016-05-20  9:59 ` [Qemu-devel] " Liang Li
2016-05-20 10:32 ` Cornelia Huck
2016-05-20 10:32   ` [Qemu-devel] " Cornelia Huck
2016-05-24  7:48   ` Li, Liang Z
2016-05-24  7:48     ` [Qemu-devel] " Li, Liang Z
2016-05-24  7:48   ` Li, Liang Z
2016-05-20 10:32 ` Cornelia Huck
2016-05-20 11:19 ` Paolo Bonzini
2016-05-20 11:19   ` [Qemu-devel] " Paolo Bonzini
2016-05-20 11:19   ` Paolo Bonzini
2016-05-24  7:51   ` Li, Liang Z
2016-05-24  7:51     ` [Qemu-devel] " Li, Liang Z
2016-05-24  7:51   ` Li, Liang Z
2016-05-20 12:00 ` Michael S. Tsirkin
2016-05-20 12:00   ` [Qemu-devel] " Michael S. Tsirkin
2016-05-24  9:51   ` Li, Liang Z
2016-05-24  9:51     ` [Qemu-devel] " Li, Liang Z
2016-05-24  9:51     ` Li, Liang Z
2016-05-24  9:55     ` Li, Liang Z
2016-05-24  9:55       ` [Qemu-devel] " Li, Liang Z
2016-05-24  9:55       ` Li, Liang Z
2016-05-24 10:08     ` Michael S. Tsirkin
2016-05-24 10:08     ` Michael S. Tsirkin
2016-05-24 10:08       ` [Qemu-devel] " Michael S. Tsirkin
2016-05-24 10:38       ` Li, Liang Z
2016-05-24 10:38         ` [Qemu-devel] " Li, Liang Z
2016-05-24 10:38         ` Li, Liang Z
2016-05-24 11:11         ` Michael S. Tsirkin
2016-05-24 11:11         ` Michael S. Tsirkin [this message]
2016-05-24 11:11           ` [Qemu-devel] " Michael S. Tsirkin
2016-05-24 14:36           ` Li, Liang Z
2016-05-24 14:36             ` [Qemu-devel] " Li, Liang Z
2016-05-24 15:12             ` Michael S. Tsirkin
2016-05-24 15:12             ` Michael S. Tsirkin
2016-05-24 15:12               ` [Qemu-devel] " Michael S. Tsirkin
2016-05-25  0:52               ` Li, Liang Z
2016-05-25  0:52               ` Li, Liang Z
2016-05-25  0:52                 ` [Qemu-devel] " Li, Liang Z
2016-05-25  1:00               ` Li, Liang Z
2016-05-25  1:00                 ` [Qemu-devel] " Li, Liang Z
2016-05-25  1:00                 ` Li, Liang Z
2016-05-25  8:35                 ` Michael S. Tsirkin
2016-05-25  8:35                   ` [Qemu-devel] " Michael S. Tsirkin
2016-05-25  8:35                 ` Michael S. Tsirkin
2016-05-24 14:36           ` Li, Liang Z
2016-05-25  8:48       ` Li, Liang Z
2016-05-25  8:48         ` [Qemu-devel] " Li, Liang Z
2016-05-25  8:48         ` Li, Liang Z
2016-05-25  8:57         ` Michael S. Tsirkin
2016-05-25  8:57           ` [Qemu-devel] " Michael S. Tsirkin
2016-05-25  8:57           ` Michael S. Tsirkin
2016-05-25  9:28           ` Li, Liang Z
2016-05-25  9:28             ` [Qemu-devel] " Li, Liang Z
2016-05-25  9:28             ` Li, Liang Z
2016-05-25  9:40             ` Michael S. Tsirkin
2016-05-25  9:40               ` [Qemu-devel] " Michael S. Tsirkin
2016-05-25  9:40               ` Michael S. Tsirkin
2016-05-25 10:10               ` Li, Liang Z
2016-05-25 10:10                 ` [Qemu-devel] " Li, Liang Z
2016-05-25 10:10                 ` Li, Liang Z
2016-05-25 10:37                 ` Michael S. Tsirkin
2016-05-25 10:37                 ` Michael S. Tsirkin
2016-05-25 10:37                   ` [Qemu-devel] " Michael S. Tsirkin
2016-05-25 14:29                   ` Li, Liang Z
2016-05-25 14:29                     ` [Qemu-devel] " Li, Liang Z
2016-05-25 14:29                     ` Li, Liang Z
2016-05-25 14:29                   ` Li, Liang Z
2016-05-20 12:00 ` Michael S. Tsirkin
  -- strict thread matches above, loose matches on Subject: below --
2016-05-20  9:59 Liang Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160524111135.GA7392@redhat.com \
    --to=mst@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=amit.shah@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=liang.z.li@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.