Linux virtualization list

Linux virtualization list
 help / color / mirror / Atom feed

* RE: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Li, Liang Z @ 2016-03-04 14:26 UTC (permalink / raw)
  To: Roman Kagan
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	mst@redhat.com, linux-kernel@vger.kernel.org,
	Dr. David Alan Gilbert, linux-mm@kvack.org, amit.shah@redhat.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <20160304102346.GB2479@rkaganb.sw.ru>

> Subject: Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration
> optimization
> 
> On Fri, Mar 04, 2016 at 09:08:44AM +0000, Li, Liang Z wrote:
> > > On Fri, Mar 04, 2016 at 01:52:53AM +0000, Li, Liang Z wrote:
> > > > >   I wonder if it would be possible to avoid the kernel changes
> > > > > by parsing /proc/self/pagemap - if that can be used to detect
> > > > > unmapped/zero mapped pages in the guest ram, would it achieve
> > > > > the
> > > same result?
> > > >
> > > > Only detect the unmapped/zero mapped pages is not enough.
> Consider
> > > the
> > > > situation like case 2, it can't achieve the same result.
> > >
> > > Your case 2 doesn't exist in the real world.  If people could stop
> > > their main memory consumer in the guest prior to migration they
> > > wouldn't need live migration at all.
> >
> > The case 2 is just a simplified scenario, not a real case.
> > As long as the guest's memory usage does not keep increasing, or not
> > always run out, it can be covered by the case 2.
> 
> The memory usage will keep increasing due to ever growing caches, etc, so
> you'll be left with very little free memory fairly soon.
> 

I don't think so.

> > > I tend to think you can safely assume there's no free memory in the
> > > guest, so there's little point optimizing for it.
> >
> > If this is true, we should not inflate the balloon either.
> 
> We certainly should if there's "available" memory, i.e. not free but cheap to
> reclaim.
> 

What's your mean by "available" memory? if they are not free, I don't think it's cheap.

> > > OTOH it makes perfect sense optimizing for the unmapped memory
> > > that's made up, in particular, by the ballon, and consider inflating
> > > the balloon right before migration unless you already maintain it at
> > > the optimal size for other reasons (like e.g. a global resource manager
> optimizing the VM density).
> > >
> >
> > Yes, I believe the current balloon works and it's simple. Do you take the
> performance impact for consideration?
> > For and 8G guest, it takes about 5s to  inflating the balloon. But it
> > only takes 20ms to  traverse the free_list and construct the free pages
> bitmap.
> 
> I don't have any feeling of how important the difference is.  And if the
> limiting factor for balloon inflation speed is the granularity of communication
> it may be worth optimizing that, because quick balloon reaction may be
> important in certain resource management scenarios.
> 
> > By inflating the balloon, all the guest's pages are still be processed (zero
> page checking).
> 
> Not sure what you mean.  If you describe the current state of affairs that's
> exactly the suggested optimization point: skip unmapped pages.
> 

You'd better check the live migration code.

> > The only advantage of ' inflating the balloon before live migration' is simple,
> nothing more.
> 
> That's a big advantage.  Another one is that it does something useful in real-
> world scenarios.
> 

I don't think the heave performance impaction is something useful in real world scenarios.

Liang
> Roman.

^ permalink raw reply

* [PATCH V4 3/3] vhost_net: basic polling support
From: Jason Wang @ 2016-03-04 11:24 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel
  Cc: yang.zhang.wz, borntraeger, RAPOPORT
In-Reply-To: <1457090693-55974-1-git-send-email-jasowang@redhat.com>

This patch tries to poll for new added tx buffer or socket receive
queue for a while at the end of tx/rx processing. The maximum time
spent on polling were specified through a new kind of vring ioctl.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c        | 78 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/vhost/vhost.c      | 14 +++++++++
 drivers/vhost/vhost.h      |  1 +
 include/uapi/linux/vhost.h |  6 ++++
 4 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9eda69e..eefdbc7 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -287,6 +287,43 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 	rcu_read_unlock_bh();
 }
 
+static inline unsigned long busy_clock(void)
+{
+	return local_clock() >> 10;
+}
+
+static bool vhost_can_busy_poll(struct vhost_dev *dev,
+				unsigned long endtime)
+{
+	return likely(!need_resched()) &&
+	       likely(!time_after(busy_clock(), endtime)) &&
+	       likely(!signal_pending(current)) &&
+	       !vhost_has_work(dev);
+}
+
+static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
+				    struct vhost_virtqueue *vq,
+				    struct iovec iov[], unsigned int iov_size,
+				    unsigned int *out_num, unsigned int *in_num)
+{
+	unsigned long uninitialized_var(endtime);
+	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+				    out_num, in_num, NULL, NULL);
+
+	if (r == vq->num && vq->busyloop_timeout) {
+		preempt_disable();
+		endtime = busy_clock() + vq->busyloop_timeout;
+		while (vhost_can_busy_poll(vq->dev, endtime) &&
+		       vhost_vq_avail_empty(vq->dev, vq))
+			cpu_relax_lowlatency();
+		preempt_enable();
+		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					out_num, in_num, NULL, NULL);
+	}
+
+	return r;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_tx(struct vhost_net *net)
@@ -331,10 +368,9 @@ static void handle_tx(struct vhost_net *net)
 			      % UIO_MAXIOV == nvq->done_idx))
 			break;
 
-		head = vhost_get_vq_desc(vq, vq->iov,
-					 ARRAY_SIZE(vq->iov),
-					 &out, &in,
-					 NULL, NULL);
+		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
+						ARRAY_SIZE(vq->iov),
+						&out, &in);
 		/* On error, stop handling until the next kick. */
 		if (unlikely(head < 0))
 			break;
@@ -435,6 +471,38 @@ static int peek_head_len(struct sock *sk)
 	return len;
 }
 
+static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
+{
+	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
+	struct vhost_virtqueue *vq = &nvq->vq;
+	unsigned long uninitialized_var(endtime);
+	int len = peek_head_len(sk);
+
+	if (!len && vq->busyloop_timeout) {
+		/* Both tx vq and rx socket were polled here */
+		mutex_lock(&vq->mutex);
+		vhost_disable_notify(&net->dev, vq);
+
+		preempt_disable();
+		endtime = busy_clock() + vq->busyloop_timeout;
+
+		while (vhost_can_busy_poll(&net->dev, endtime) &&
+		       skb_queue_empty(&sk->sk_receive_queue) &&
+		       vhost_vq_avail_empty(&net->dev, vq))
+			cpu_relax_lowlatency();
+
+		preempt_enable();
+
+		if (vhost_enable_notify(&net->dev, vq))
+			vhost_poll_queue(&vq->poll);
+		mutex_unlock(&vq->mutex);
+
+		len = peek_head_len(sk);
+	}
+
+	return len;
+}
+
 /* This is a multi-buffer version of vhost_get_desc, that works if
  *	vq has read descriptors only.
  * @vq		- the relevant virtqueue
@@ -553,7 +621,7 @@ static void handle_rx(struct vhost_net *net)
 		vq->log : NULL;
 	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
 
-	while ((sock_len = peek_head_len(sock->sk))) {
+	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
 		headcount = get_rx_bufs(vq, vq->heads, vhost_len,
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 89301fd..b4b7eda 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -285,6 +285,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->memory = NULL;
 	vq->is_le = virtio_legacy_is_little_endian();
 	vhost_vq_reset_user_be(vq);
+	vq->busyloop_timeout = 0;
 }
 
 static int vhost_worker(void *data)
@@ -919,6 +920,19 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
 	case VHOST_GET_VRING_ENDIAN:
 		r = vhost_get_vring_endian(vq, idx, argp);
 		break;
+	case VHOST_SET_VRING_BUSYLOOP_TIMEOUT:
+		if (copy_from_user(&s, argp, sizeof(s))) {
+			r = -EFAULT;
+			break;
+		}
+		vq->busyloop_timeout = s.num;
+		break;
+	case VHOST_GET_VRING_BUSYLOOP_TIMEOUT:
+		s.index = idx;
+		s.num = vq->busyloop_timeout;
+		if (copy_to_user(argp, &s, sizeof(s)))
+			r = -EFAULT;
+		break;
 	default:
 		r = -ENOIOCTLCMD;
 	}
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index a7a43f0..9a02158 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -115,6 +115,7 @@ struct vhost_virtqueue {
 	/* Ring endianness requested by userspace for cross-endian support. */
 	bool user_be;
 #endif
+	u32 busyloop_timeout;
 };
 
 struct vhost_dev {
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index ab373191..61a8777 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -126,6 +126,12 @@ struct vhost_memory {
 #define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)
 /* Set eventfd to signal an error */
 #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
+/* Set busy loop timeout (in us) */
+#define VHOST_SET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x23,	\
+					 struct vhost_vring_state)
+/* Get busy loop timeout (in us) */
+#define VHOST_GET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x24,	\
+					 struct vhost_vring_state)
 
 /* VHOST_NET specific defines */
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH V4 2/3] vhost: introduce vhost_vq_avail_empty()
From: Jason Wang @ 2016-03-04 11:24 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel
  Cc: yang.zhang.wz, borntraeger, RAPOPORT
In-Reply-To: <1457090693-55974-1-git-send-email-jasowang@redhat.com>

This patch introduces a helper which will return true if we're sure
that the available ring is empty for a specific vq. When we're not
sure, e.g vq access failure, return false instead. This could be used
for busy polling code to exit the busy loop.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 14 ++++++++++++++
 drivers/vhost/vhost.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 90ac092..89301fd 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1633,6 +1633,20 @@ void vhost_add_used_and_signal_n(struct vhost_dev *dev,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
 
+/* return true if we're sure that avaiable ring is empty */
+bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	__virtio16 avail_idx;
+	int r;
+
+	r = __get_user(avail_idx, &vq->avail->idx);
+	if (r)
+		return false;
+
+	return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx;
+}
+EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
+
 /* OK, now we need to know about added descriptors. */
 bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 43284ad..a7a43f0 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -159,6 +159,7 @@ void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
 void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
 void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
+bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
 bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH V4 1/3] vhost: introduce vhost_has_work()
From: Jason Wang @ 2016-03-04 11:24 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel
  Cc: yang.zhang.wz, borntraeger, RAPOPORT
In-Reply-To: <1457090693-55974-1-git-send-email-jasowang@redhat.com>

This path introduces a helper which can give a hint for whether or not
there's a work queued in the work list. This could be used for busy
polling code to exit the busy loop.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 7 +++++++
 drivers/vhost/vhost.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index ad2146a..90ac092 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -245,6 +245,13 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);
 
+/* A lockless hint for busy polling code to exit the loop */
+bool vhost_has_work(struct vhost_dev *dev)
+{
+	return !list_empty(&dev->work_list);
+}
+EXPORT_SYMBOL_GPL(vhost_has_work);
+
 void vhost_poll_queue(struct vhost_poll *poll)
 {
 	vhost_work_queue(poll->dev, &poll->work);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index d3f7674..43284ad 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -37,6 +37,7 @@ struct vhost_poll {
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
+bool vhost_has_work(struct vhost_dev *dev);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 		     unsigned long mask, struct vhost_dev *dev);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH V4 0/3] basic busy polling support for vhost_net
From: Jason Wang @ 2016-03-04 11:24 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel
  Cc: yang.zhang.wz, borntraeger, RAPOPORT

This series tries to add basic busy polling for vhost net. The idea is
simple: at the end of tx/rx processing, busy polling for new tx added
descriptor and rx receive socket for a while. The maximum number of
time (in us) could be spent on busy polling was specified ioctl.

Test A were done through:

- 50 us as busy loop timeout
- Netperf 2.6
- Two machines with back to back connected mlx4
- Guest with 1 vcpus and 1 queue

Results:
- Obvious improvements (%5 - 20%) for latency (TCP_RR).
- Get a better or minor regression on most of the TX tests, but see
  some regression on 4096 size.
- Except for 8 sessions of 4096 size RX, have a better or same
  performance.
- CPU utilization were incrased as expected.

TCP_RR:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
    1/     1/   +8%/  -32%/   +8%/   +8%/   +7%
    1/    50/   +7%/  -19%/   +7%/   +7%/   +1%
    1/   100/   +5%/  -21%/   +5%/   +5%/    0%
    1/   200/   +5%/  -21%/   +7%/   +7%/   +1%
   64/     1/  +11%/  -29%/  +11%/  +11%/  +10%
   64/    50/   +7%/  -19%/   +8%/   +8%/   +2%
   64/   100/   +8%/  -18%/   +9%/   +9%/   +2%
   64/   200/   +6%/  -19%/   +6%/   +6%/    0%
  256/     1/   +7%/  -33%/   +7%/   +7%/   +6%
  256/    50/   +7%/  -18%/   +7%/   +7%/    0%
  256/   100/   +9%/  -18%/   +8%/   +8%/   +2%
  256/   200/   +9%/  -18%/  +10%/  +10%/   +3%
 1024/     1/  +20%/  -28%/  +20%/  +20%/  +19%
 1024/    50/   +8%/  -18%/   +9%/   +9%/   +2%
 1024/   100/   +6%/  -19%/   +5%/   +5%/    0%
 1024/   200/   +8%/  -18%/   +9%/   +9%/   +2%
Guest TX:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
   64/     1/   -5%/  -28%/  +11%/  +12%/  +10%
   64/     4/   -2%/  -26%/  +13%/  +13%/  +13%
   64/     8/   -6%/  -29%/   +9%/  +10%/  +10%
  512/     1/  +15%/   -7%/  +13%/  +11%/   +3%
  512/     4/  +17%/   -6%/  +18%/  +13%/  +11%
  512/     8/  +14%/   -7%/  +13%/   +7%/   +7%
 1024/     1/  +27%/   -2%/  +26%/  +29%/  +12%
 1024/     4/   +8%/   -9%/   +6%/   +1%/   +6%
 1024/     8/  +41%/  +12%/  +34%/  +20%/   -3%
 4096/     1/  -22%/  -21%/  -36%/  +81%/+1360%
 4096/     4/  -57%/  -58%/ +286%/  +15%/+2074%
 4096/     8/  +67%/  +70%/  -45%/   -8%/  +63%
16384/     1/   -2%/   -5%/   +5%/   -3%/  +80%
16384/     4/    0%/    0%/    0%/   +4%/ +138%
16384/     8/    0%/    0%/    0%/   +1%/  +41%
65535/     1/   -3%/   -6%/   +2%/  +11%/ +113%
65535/     4/   -2%/   -1%/   -2%/   -3%/ +484%
65535/     8/    0%/   +1%/    0%/   +2%/  +40%
Guest RX:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
   64/     1/  +31%/   -3%/   +8%/   +8%/   +8%
   64/     4/  +11%/  -17%/  +13%/  +14%/  +15%
   64/     8/   +4%/  -23%/  +11%/  +11%/  +12%
  512/     1/  +24%/    0%/  +18%/  +14%/   -8%
  512/     4/   +4%/  -15%/   +6%/   +5%/   +6%
  512/     8/  +26%/    0%/  +21%/  +10%/   +3%
 1024/     1/  +88%/  +47%/  +69%/  +44%/  -30%
 1024/     4/  +18%/   -5%/  +19%/  +16%/   +2%
 1024/     8/  +15%/   -4%/  +13%/   +8%/   +1%
 4096/     1/   -3%/   -5%/   +2%/   -2%/  +41%
 4096/     4/   +2%/   +3%/  -20%/  -14%/  -24%
 4096/     8/  -43%/  -45%/  +69%/  -24%/  +94%
16384/     1/   -3%/  -11%/  +23%/   +7%/  +42%
16384/     4/   -3%/   -3%/   -4%/   +5%/ +115%
16384/     8/   -1%/    0%/   -1%/   -3%/  +32%
65535/     1/   +1%/    0%/   +2%/    0%/  +66%
65535/     4/   -1%/   -1%/    0%/   +4%/ +492%
65535/     8/    0%/   -1%/   -1%/   +4%/  +38%

Changes from V3:
- drop single_task_running()
- use cpu_relax_lowlatency() instead of cpu_relax()

Changes from V2:
- rename vhost_vq_more_avail() to vhost_vq_avail_empty(). And return
false we __get_user() fails.
- do not bother premmptions/timers for good path.
- use vhost_vring_state as ioctl parameter instead of reinveting a new
one.
- add the unit of timeout (us) to the comment of new added ioctls

Changes from V1:
- remove the buggy vq_error() in vhost_vq_more_avail().
- leave vhost_enable_notify() untouched.

Changes from RFC V3:
- small tweak on the code to avoid multiple duplicate conditions in
critical path when busy loop is not enabled.
- add the test result of multiple VMs

Changes from RFC V2:
- poll also at the end of rx handling
- factor out the polling logic and optimize the code a little bit
- add two ioctls to get and set the busy poll timeout
- test on ixgbe (which can give more stable and reproducable numbers)
instead of mlx4.

Changes from RFC V1:
- add a comment for vhost_has_work() to explain why it could be
lockless
- add param description for busyloop_timeout
- split out the busy polling logic into a new helper
- check and exit the loop when there's a pending signal
- disable preemption during busy looping to make sure lock_clock() was
correctly used.

Jason Wang (3):
  vhost: introduce vhost_has_work()
  vhost: introduce vhost_vq_avail_empty()
  vhost_net: basic polling support

 drivers/vhost/net.c        | 78 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/vhost/vhost.c      | 35 +++++++++++++++++++++
 drivers/vhost/vhost.h      |  3 ++
 include/uapi/linux/vhost.h |  6 ++++
 4 files changed, 117 insertions(+), 5 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Michael S. Tsirkin @ 2016-03-04 10:36 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	linux-kernel@vger.kernel.org, Dr. David Alan Gilbert,
	linux-mm@kvack.org, Roman Kagan, amit.shah@redhat.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E037717B5@SHSMSX101.ccr.corp.intel.com>

On Fri, Mar 04, 2016 at 10:11:00AM +0000, Li, Liang Z wrote:
> > On Fri, Mar 04, 2016 at 09:12:12AM +0000, Li, Liang Z wrote:
> > > > Although I wonder which is cheaper; that would be fairly expensive
> > > > for the guest wouldn't it? And you'd somehow have to kick the guest
> > > > before migration to do the ballooning - and how long would you wait for
> > it to finish?
> > >
> > > About 5 seconds for an 8G guest, balloon to 1G. Get the free pages
> > > bitmap take about 20ms for an 8G idle guest.
> > >
> > > Liang
> > 
> > Where is the time spent though? allocating within guest?
> > Or passing the info to host?
> > If the former, we can use existing inflate/deflate vqs:
> > Have guest put each free page on inflate vq, then on deflate vq.
> > 
> 
> Maybe I am not clear enough.
> 
> I mean if we inflate balloon before live migration, for a 8GB guest, it takes about 5 Seconds for the inflating operation to finish.

And these 5 seconds are spent where?

> For the PV solution, there is no need to inflate balloon before live migration, the only cost is to traversing the free_list to
>  construct the free pages bitmap, and it takes about 20ms for a 8GB idle guest( less if there is less free pages),
>  passing the free pages info to host will take about extra 3ms.
> 
> 
> Liang

So now let's please stop talking about solutions at a high level and
discuss the interface changes you make in detail.
What makes it faster? Better host/guest interface? No need to go through
buddy allocator within guest? Less interrupts? Something else?


> > --
> > MST

^ permalink raw reply

* Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Roman Kagan @ 2016-03-04 10:23 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	mst@redhat.com, linux-kernel@vger.kernel.org,
	Dr. David Alan Gilbert, linux-mm@kvack.org, amit.shah@redhat.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E0377160A@SHSMSX101.ccr.corp.intel.com>

On Fri, Mar 04, 2016 at 09:08:44AM +0000, Li, Liang Z wrote:
> > On Fri, Mar 04, 2016 at 01:52:53AM +0000, Li, Liang Z wrote:
> > > >   I wonder if it would be possible to avoid the kernel changes by
> > > > parsing /proc/self/pagemap - if that can be used to detect
> > > > unmapped/zero mapped pages in the guest ram, would it achieve the
> > same result?
> > >
> > > Only detect the unmapped/zero mapped pages is not enough. Consider
> > the
> > > situation like case 2, it can't achieve the same result.
> > 
> > Your case 2 doesn't exist in the real world.  If people could stop their main
> > memory consumer in the guest prior to migration they wouldn't need live
> > migration at all.
> 
> The case 2 is just a simplified scenario, not a real case.
> As long as the guest's memory usage does not keep increasing, or not always run out,
> it can be covered by the case 2.

The memory usage will keep increasing due to ever growing caches, etc,
so you'll be left with very little free memory fairly soon.

> > I tend to think you can safely assume there's no free memory in the guest, so
> > there's little point optimizing for it.
> 
> If this is true, we should not inflate the balloon either.

We certainly should if there's "available" memory, i.e. not free but
cheap to reclaim.

> > OTOH it makes perfect sense optimizing for the unmapped memory that's
> > made up, in particular, by the ballon, and consider inflating the balloon right
> > before migration unless you already maintain it at the optimal size for other
> > reasons (like e.g. a global resource manager optimizing the VM density).
> > 
> 
> Yes, I believe the current balloon works and it's simple. Do you take the performance impact for consideration?
> For and 8G guest, it takes about 5s to  inflating the balloon. But it only takes 20ms to  traverse the free_list and
> construct the free pages bitmap.

I don't have any feeling of how important the difference is.  And if the
limiting factor for balloon inflation speed is the granularity of
communication it may be worth optimizing that, because quick balloon
reaction may be important in certain resource management scenarios.

> By inflating the balloon, all the guest's pages are still be processed (zero page checking).

Not sure what you mean.  If you describe the current state of affairs
that's exactly the suggested optimization point: skip unmapped pages.

> The only advantage of ' inflating the balloon before live migration' is simple, nothing more.

That's a big advantage.  Another one is that it does something useful in
real-world scenarios.

Roman.

^ permalink raw reply

* Re: [PATCH 11/16] drm/atmel-hldcd: removed optional dummy crtc mode_fixup function.
From: Boris Brezillon @ 2016-03-04 10:17 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: nicolas.pitre, jianwei.wang.chn, alison.wang, airlied,
	daniel.vetter, Carlos Palminha, dri-devel, virtualization,
	linux-renesas-soc, jani.nikula, patrik.r.jakobsson,
	benjamin.gaignard, vincent.abriou, sudipm.mukherjee,
	laurent.pinchart
In-Reply-To: <20160217130801.GF32705@phenom.ffwll.local>

Hi Daniel,

On Wed, 17 Feb 2016 14:08:01 +0100
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Wed, Feb 17, 2016 at 09:02:44AM +0000, Carlos Palminha wrote:
> > Thanks Boris.
> > 
> > @Daniel, do you want me to resend this patch or will you fix it directly in mode-fixup git branch?
> 
> I can fix the typos, but I'm meh on the whitespace change ;-) Imo that
> doesn't need a resend.

Oops, I forgot to answer to that one. I'm fine with the extra space
removal too, you can apply it.

Thanks,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH 01/16] drm: fixes crct set_mode when crtc mode_fixup is null.
From: Carlos Palminha @ 2016-03-04 10:14 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: nicolas.pitre, boris.brezillon, jianwei.wang.chn, airlied,
	daniel.vetter, alison.wang, patrik.r.jakobsson, virtualization,
	linux-renesas-soc, jani.nikula, dri-devel, benjamin.gaignard,
	vincent.abriou, sudipm.mukherjee, laurent.pinchart
In-Reply-To: <20160216143733.GF32705@phenom.ffwll.local>



On 16-02-2016 14:37, Daniel Vetter wrote:
> On Tue, Feb 16, 2016 at 02:10:03PM +0000, Carlos Palminha wrote:
>> This patch set nukes all the dummy crtc mode_fixup implementations.
>> (made on top of Daniel topic/drm-misc branch)
>>
>> Signed-off-by: Carlos Palminha <palminha@synopsys.com>
> 
> Applied this one to drm-misc. I'll let the others hang out there for a bit
> more to collect acks.

It seems that we are not getting more ACKs.
How can we push this forward?

> 
> Thanks, Daniel
> 
>> ---
>>  drivers/gpu/drm/drm_crtc_helper.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c
>> index e70d064..7539eea 100644
>> --- a/drivers/gpu/drm/drm_crtc_helper.c
>> +++ b/drivers/gpu/drm/drm_crtc_helper.c
>> @@ -343,9 +343,12 @@ bool drm_crtc_helper_set_mode(struct drm_crtc *crtc,
>>  		}
>>  	}
>>  
>> -	if (!(ret = crtc_funcs->mode_fixup(crtc, mode, adjusted_mode))) {
>> -		DRM_DEBUG_KMS("CRTC fixup failed\n");
>> -		goto done;
>> +	if (crtc_funcs->mode_fixup) {
>> +		if (!(ret = crtc_funcs->mode_fixup(crtc, mode,
>> +						adjusted_mode))) {
>> +			DRM_DEBUG_KMS("CRTC fixup failed\n");
>> +			goto done;
>> +		}
>>  	}
>>  	DRM_DEBUG_KMS("[CRTC:%d:%s]\n", crtc->base.id, crtc->name);
>>  
>> -- 
>> 2.5.0
>>
> 

^ permalink raw reply

* RE: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Li, Liang Z @ 2016-03-04 10:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	linux-kernel@vger.kernel.org, Dr. David Alan Gilbert,
	linux-mm@kvack.org, Roman Kagan, amit.shah@redhat.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <20160304114519-mutt-send-email-mst@redhat.com>

> On Fri, Mar 04, 2016 at 09:12:12AM +0000, Li, Liang Z wrote:
> > > Although I wonder which is cheaper; that would be fairly expensive
> > > for the guest wouldn't it? And you'd somehow have to kick the guest
> > > before migration to do the ballooning - and how long would you wait for
> it to finish?
> >
> > About 5 seconds for an 8G guest, balloon to 1G. Get the free pages
> > bitmap take about 20ms for an 8G idle guest.
> >
> > Liang
> 
> Where is the time spent though? allocating within guest?
> Or passing the info to host?
> If the former, we can use existing inflate/deflate vqs:
> Have guest put each free page on inflate vq, then on deflate vq.
> 

Maybe I am not clear enough.

I mean if we inflate balloon before live migration, for a 8GB guest, it takes about 5 Seconds for the inflating operation to finish.

For the PV solution, there is no need to inflate balloon before live migration, the only cost is to traversing the free_list to
 construct the free pages bitmap, and it takes about 20ms for a 8GB idle guest( less if there is less free pages),
 passing the free pages info to host will take about extra 3ms.


Liang
> --
> MST

^ permalink raw reply

* Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Michael S. Tsirkin @ 2016-03-04  9:47 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	linux-kernel@vger.kernel.org, Dr. David Alan Gilbert,
	linux-mm@kvack.org, Roman Kagan, amit.shah@redhat.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E03771639@SHSMSX101.ccr.corp.intel.com>

On Fri, Mar 04, 2016 at 09:12:12AM +0000, Li, Liang Z wrote:
> > Although I wonder which is cheaper; that would be fairly expensive for the
> > guest wouldn't it? And you'd somehow have to kick the guest before
> > migration to do the ballooning - and how long would you wait for it to finish?
> 
> About 5 seconds for an 8G guest, balloon to 1G. Get the free pages bitmap take about 20ms
> for an 8G idle guest.
> 
> Liang

Where is the time spent though? allocating within guest?
Or passing the info to host?
If the former, we can use existing inflate/deflate vqs:
Have guest put each free page on inflate vq, then on deflate vq.

-- 
MST

^ permalink raw reply

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
From: Li, Liang Z @ 2016-03-04  9:36 UTC (permalink / raw)
  To: Jitendra Kolhe, dgilbert@redhat.com
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, simhan@hpe.com,
	mst@redhat.com, qemu-devel@nongnu.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mohan_parthasarathy@hpe.com, amit.shah@redhat.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <1457083967-13681-1-git-send-email-jitendra.kolhe@hpe.com>

> > > * Liang Li (liang.z.li@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these
> > > > pages will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in
> > > > free pages. We can make use of this fact and skip processing the
> > > > free pages in the ram bulk stage, it can save a lot CPU cycles and
> > > > reduce the network traffic significantly while speed up the live
> > > > migration process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use
> > > > it to filter out the guest's free pages in the ram bulk stage.
> > > > This make the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been
> > > looking at how to speed up ballooned VM migration.
> > >
> >
> > Ooh, different solutions for the same purpose, and both based on the
> balloon.
> 
> We were also tying to address similar problem, without actually needing to
> modify the guest driver. Please find patch details under mail with subject.
> migration: skip sending ram pages released by virtio-balloon driver
> 
> Thanks,
> - Jitendra
> 

Great! Thanks for your information.

Liang
> >
> > >   I wonder if it would be possible to avoid the kernel changes by
> > > parsing /proc/self/pagemap - if that can be used to detect
> > > unmapped/zero mapped pages in the guest ram, would it achieve the
> same result?
> > >

^ permalink raw reply

* Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Roman Kagan @ 2016-03-04  9:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, Li, Liang Z,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	linux-mm@kvack.org, mst@redhat.com, amit.shah@redhat.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <20160304090820.GA2149@work-vm>

On Fri, Mar 04, 2016 at 09:08:20AM +0000, Dr. David Alan Gilbert wrote:
> * Roman Kagan (rkagan@virtuozzo.com) wrote:
> > On Fri, Mar 04, 2016 at 08:23:09AM +0000, Li, Liang Z wrote:
> > > The unmapped/zero mapped pages can be detected by parsing /proc/self/pagemap,
> > > but the free pages can't be detected by this. Imaging an application allocates a large amount
> > > of memory , after using, it frees the memory, then live migration happens. All these free pages
> > > will be process and sent to the destination, it's not optimal.
> > 
> > First, the likelihood of such a situation is marginal, there's no point
> > optimizing for it specifically.
> > 
> > And second, even if that happens, you inflate the balloon right before
> > the migration and the free memory will get umapped very quickly, so this
> > case is covered nicely by the same technique that works for more
> > realistic cases, too.
> 
> Although I wonder which is cheaper; that would be fairly expensive for
> the guest wouldn't it?

For the guest -- generally it wouldn't if you have a good estimate of
available memory (i.e. the amount you can balloon out without forcing
the guest to swap).

And yes you need certain cost estimates for choosing the best migration
strategy: e.g. if your network bandwidth is unlimited you may be better
off transferring the zeros to the destination rather than optimizing
them away.

> And you'd somehow have to kick the guest
> before migration to do the ballooning - and how long would you wait
> for it to finish?

It's a matter for fine-tuning with all the inputs at hand, like network
bandwidth, costs of delaying the migration, etc.  And you don't need to
wait for it to finish, i.e. reach the balloon size target: you can start
the migration as soon as it's good enough (for whatever definition of
"enough" is found appropriate by that fine-tuning).

Roman.

^ permalink raw reply

* [RFC qemu 0/4] A PV solution for live migration optimization
From: Jitendra Kolhe @ 2016-03-04  9:32 UTC (permalink / raw)
  To: liang.z.li, dgilbert
  Cc: ehabkost, kvm, simhan, mst, qemu-devel, linux-kernel,
	jitendra.kolhe, linux-mm, mohan_parthasarathy, amit.shah,
	pbonzini, akpm, virtualization, rth

> >
> > * Liang Li (liang.z.li@intel.com) wrote:
> > > The current QEMU live migration implementation mark the all the
> > > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > > will be processed and that takes quit a lot of CPU cycles.
> > >
> > > From guest's point of view, it doesn't care about the content in free
> > > pages. We can make use of this fact and skip processing the free pages
> > > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > > network traffic significantly while speed up the live migration
> > > process obviously.
> > >
> > > This patch set is the QEMU side implementation.
> > >
> > > The virtio-balloon is extended so that QEMU can get the free pages
> > > information from the guest through virtio.
> > >
> > > After getting the free pages information (a bitmap), QEMU can use it
> > > to filter out the guest's free pages in the ram bulk stage. This make
> > > the live migration process much more efficient.
> >
> > Hi,
> >   An interesting solution; I know a few different people have been looking at
> > how to speed up ballooned VM migration.
> >
>
> Ooh, different solutions for the same purpose, and both based on the balloon.

We were also tying to address similar problem, without actually needing to modify
the guest driver. Please find patch details under mail with subject.
migration: skip sending ram pages released by virtio-balloon driver

Thanks,
- Jitendra

>
> >   I wonder if it would be possible to avoid the kernel changes by parsing
> > /proc/self/pagemap - if that can be used to detect unmapped/zero mapped
> > pages in the guest ram, would it achieve the same result?
> >
>
> Only detect the unmapped/zero mapped pages is not enough. Consider the
> situation like case 2, it can't achieve the same result.
>
> > > This RFC version doesn't take the post-copy and RDMA into
> > > consideration, maybe both of them can benefit from this PV solution by
> > > with some extra modifications.
> >
> > For postcopy to be safe, you would still need to send a message to the
> > destination telling it that there were zero pages, otherwise the destination
> > can't tell if it's supposed to request the page from the source or treat the
> > page as zero.
> >
> > Dave
>
> I will consider this later, thanks, Dave.
>
> Liang
>
> >
> > >
> > > Performance data
> > > ================
> > >
> > > Test environment:
> > >
> > > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > > Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> > > Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> > > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> > >
> > > Case 1: Idle guest just boots:
> > > ============================================
> > >                     | original  |    pv
> > > -------------------------------------------
> > > total time(ms)      |    1894   |   421
> > > --------------------------------------------
> > > transferred ram(KB) |   398017  |  353242
> > > ============================================
> > >
> > >
> > > Case 2: The guest has ever run some memory consuming workload, the
> > > workload is terminated just before live migration.
> > > ============================================
> > >                     | original  |    pv
> > > -------------------------------------------
> > > total time(ms)      |   7436    |   552
> > > --------------------------------------------
> > > transferred ram(KB) |  8146291  |  361375
> > > ============================================
> > >
>
>

^ permalink raw reply

* RE: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Li, Liang Z @ 2016-03-04  9:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Roman Kagan, ehabkost@redhat.com,
	kvm@vger.kernel.org, mst@redhat.com, quintela@redhat.com,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	linux-mm@kvack.org, amit.shah@redhat.com, pbonzini@redhat.com,
	akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <20160304090820.GA2149@work-vm>

> * Roman Kagan (rkagan@virtuozzo.com) wrote:
> > On Fri, Mar 04, 2016 at 08:23:09AM +0000, Li, Liang Z wrote:
> > > > On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert wrote:
> > > > > * Liang Li (liang.z.li@intel.com) wrote:
> > > > > > The current QEMU live migration implementation mark the all
> > > > > > the guest's RAM pages as dirtied in the ram bulk stage, all
> > > > > > these pages will be processed and that takes quit a lot of CPU cycles.
> > > > > >
> > > > > > From guest's point of view, it doesn't care about the content
> > > > > > in free pages. We can make use of this fact and skip
> > > > > > processing the free pages in the ram bulk stage, it can save a
> > > > > > lot CPU cycles and reduce the network traffic significantly
> > > > > > while speed up the live migration process obviously.
> > > > > >
> > > > > > This patch set is the QEMU side implementation.
> > > > > >
> > > > > > The virtio-balloon is extended so that QEMU can get the free
> > > > > > pages information from the guest through virtio.
> > > > > >
> > > > > > After getting the free pages information (a bitmap), QEMU can
> > > > > > use it to filter out the guest's free pages in the ram bulk
> > > > > > stage. This make the live migration process much more efficient.
> > > > >
> > > > > Hi,
> > > > >   An interesting solution; I know a few different people have
> > > > > been looking at how to speed up ballooned VM migration.
> > > > >
> > > > >   I wonder if it would be possible to avoid the kernel changes
> > > > > by parsing /proc/self/pagemap - if that can be used to detect
> > > > > unmapped/zero mapped pages in the guest ram, would it achieve
> > > > > the
> > > > same result?
> > > >
> > > > Yes I was about to suggest the same thing: it's simple and makes
> > > > use of the existing infrastructure.  And you wouldn't need to care
> > > > if the pages were unmapped by ballooning or anything else
> > > > (alternative balloon implementations, not yet touched by the
> > > > guest, etc.).  Besides, you wouldn't need to synchronize with the guest.
> > > >
> > > > Roman.
> > >
> > > The unmapped/zero mapped pages can be detected by parsing
> > > /proc/self/pagemap, but the free pages can't be detected by this.
> > > Imaging an application allocates a large amount of memory , after
> > > using, it frees the memory, then live migration happens. All these free
> pages will be process and sent to the destination, it's not optimal.
> >
> > First, the likelihood of such a situation is marginal, there's no
> > point optimizing for it specifically.
> >
> > And second, even if that happens, you inflate the balloon right before
> > the migration and the free memory will get umapped very quickly, so
> > this case is covered nicely by the same technique that works for more
> > realistic cases, too.
> 
> Although I wonder which is cheaper; that would be fairly expensive for the
> guest wouldn't it? And you'd somehow have to kick the guest before
> migration to do the ballooning - and how long would you wait for it to finish?

About 5 seconds for an 8G guest, balloon to 1G. Get the free pages bitmap take about 20ms
for an 8G idle guest.

Liang

> 
> Dave
> 
> >
> > Roman.
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply

* RE: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Li, Liang Z @ 2016-03-04  9:08 UTC (permalink / raw)
  To: Roman Kagan
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	mst@redhat.com, linux-kernel@vger.kernel.org,
	Dr. David Alan Gilbert, linux-mm@kvack.org, amit.shah@redhat.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <20160304081411.GD9100@rkaganb.sw.ru>

> On Fri, Mar 04, 2016 at 01:52:53AM +0000, Li, Liang Z wrote:
> > >   I wonder if it would be possible to avoid the kernel changes by
> > > parsing /proc/self/pagemap - if that can be used to detect
> > > unmapped/zero mapped pages in the guest ram, would it achieve the
> same result?
> >
> > Only detect the unmapped/zero mapped pages is not enough. Consider
> the
> > situation like case 2, it can't achieve the same result.
> 
> Your case 2 doesn't exist in the real world.  If people could stop their main
> memory consumer in the guest prior to migration they wouldn't need live
> migration at all.

The case 2 is just a simplified scenario, not a real case.
As long as the guest's memory usage does not keep increasing, or not always run out,
it can be covered by the case 2.

> I tend to think you can safely assume there's no free memory in the guest, so
> there's little point optimizing for it.

If this is true, we should not inflate the balloon either.

> OTOH it makes perfect sense optimizing for the unmapped memory that's
> made up, in particular, by the ballon, and consider inflating the balloon right
> before migration unless you already maintain it at the optimal size for other
> reasons (like e.g. a global resource manager optimizing the VM density).
> 

Yes, I believe the current balloon works and it's simple. Do you take the performance impact for consideration?
For and 8G guest, it takes about 5s to  inflating the balloon. But it only takes 20ms to  traverse the free_list and
construct the free pages bitmap. In this period, the guest are very busy.

By inflating the balloon, all the guest's pages are still be processed (zero page checking).

The only advantage of ' inflating the balloon before live migration' is simple, nothing more.

Liang

> Roman.

^ permalink raw reply

* Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Dr. David Alan Gilbert @ 2016-03-04  9:08 UTC (permalink / raw)
  To: Roman Kagan, Li, Liang Z, ehabkost@redhat.com,
	kvm@vger.kernel.org, mst@redhat.com, quintela@redhat.com,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	linux-mm@kvack.org, amit.shah@redhat.com, pbonzini@redhat.com,
	akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <20160304083550.GE9100@rkaganb.sw.ru>

* Roman Kagan (rkagan@virtuozzo.com) wrote:
> On Fri, Mar 04, 2016 at 08:23:09AM +0000, Li, Liang Z wrote:
> > > On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Liang Li (liang.z.li@intel.com) wrote:
> > > > > The current QEMU live migration implementation mark the all the
> > > > > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > > > > will be processed and that takes quit a lot of CPU cycles.
> > > > >
> > > > > From guest's point of view, it doesn't care about the content in
> > > > > free pages. We can make use of this fact and skip processing the
> > > > > free pages in the ram bulk stage, it can save a lot CPU cycles and
> > > > > reduce the network traffic significantly while speed up the live
> > > > > migration process obviously.
> > > > >
> > > > > This patch set is the QEMU side implementation.
> > > > >
> > > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > > information from the guest through virtio.
> > > > >
> > > > > After getting the free pages information (a bitmap), QEMU can use it
> > > > > to filter out the guest's free pages in the ram bulk stage. This
> > > > > make the live migration process much more efficient.
> > > >
> > > > Hi,
> > > >   An interesting solution; I know a few different people have been
> > > > looking at how to speed up ballooned VM migration.
> > > >
> > > >   I wonder if it would be possible to avoid the kernel changes by
> > > > parsing /proc/self/pagemap - if that can be used to detect
> > > > unmapped/zero mapped pages in the guest ram, would it achieve the
> > > same result?
> > > 
> > > Yes I was about to suggest the same thing: it's simple and makes use of the
> > > existing infrastructure.  And you wouldn't need to care if the pages were
> > > unmapped by ballooning or anything else (alternative balloon
> > > implementations, not yet touched by the guest, etc.).  Besides, you wouldn't
> > > need to synchronize with the guest.
> > > 
> > > Roman.
> > 
> > The unmapped/zero mapped pages can be detected by parsing /proc/self/pagemap,
> > but the free pages can't be detected by this. Imaging an application allocates a large amount
> > of memory , after using, it frees the memory, then live migration happens. All these free pages
> > will be process and sent to the destination, it's not optimal.
> 
> First, the likelihood of such a situation is marginal, there's no point
> optimizing for it specifically.
> 
> And second, even if that happens, you inflate the balloon right before
> the migration and the free memory will get umapped very quickly, so this
> case is covered nicely by the same technique that works for more
> realistic cases, too.

Although I wonder which is cheaper; that would be fairly expensive for
the guest wouldn't it? And you'd somehow have to kick the guest
before migration to do the ballooning - and how long would you wait
for it to finish?

Dave

> 
> Roman.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply

* Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Roman Kagan @ 2016-03-04  8:35 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	mst@redhat.com, linux-kernel@vger.kernel.org,
	Dr. David Alan Gilbert, linux-mm@kvack.org, amit.shah@redhat.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E037714DA@SHSMSX101.ccr.corp.intel.com>

On Fri, Mar 04, 2016 at 08:23:09AM +0000, Li, Liang Z wrote:
> > On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert wrote:
> > > * Liang Li (liang.z.li@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > > > will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in
> > > > free pages. We can make use of this fact and skip processing the
> > > > free pages in the ram bulk stage, it can save a lot CPU cycles and
> > > > reduce the network traffic significantly while speed up the live
> > > > migration process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use it
> > > > to filter out the guest's free pages in the ram bulk stage. This
> > > > make the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been
> > > looking at how to speed up ballooned VM migration.
> > >
> > >   I wonder if it would be possible to avoid the kernel changes by
> > > parsing /proc/self/pagemap - if that can be used to detect
> > > unmapped/zero mapped pages in the guest ram, would it achieve the
> > same result?
> > 
> > Yes I was about to suggest the same thing: it's simple and makes use of the
> > existing infrastructure.  And you wouldn't need to care if the pages were
> > unmapped by ballooning or anything else (alternative balloon
> > implementations, not yet touched by the guest, etc.).  Besides, you wouldn't
> > need to synchronize with the guest.
> > 
> > Roman.
> 
> The unmapped/zero mapped pages can be detected by parsing /proc/self/pagemap,
> but the free pages can't be detected by this. Imaging an application allocates a large amount
> of memory , after using, it frees the memory, then live migration happens. All these free pages
> will be process and sent to the destination, it's not optimal.

First, the likelihood of such a situation is marginal, there's no point
optimizing for it specifically.

And second, even if that happens, you inflate the balloon right before
the migration and the free memory will get umapped very quickly, so this
case is covered nicely by the same technique that works for more
realistic cases, too.

Roman.

^ permalink raw reply

* RE: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Li, Liang Z @ 2016-03-04  8:23 UTC (permalink / raw)
  To: Roman Kagan, Dr. David Alan Gilbert
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, mst@redhat.com,
	qemu-devel@nongnu.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, amit.shah@redhat.com, pbonzini@redhat.com,
	akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <20160304075538.GC9100@rkaganb.sw.ru>

> On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert wrote:
> > * Liang Li (liang.z.li@intel.com) wrote:
> > > The current QEMU live migration implementation mark the all the
> > > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > > will be processed and that takes quit a lot of CPU cycles.
> > >
> > > From guest's point of view, it doesn't care about the content in
> > > free pages. We can make use of this fact and skip processing the
> > > free pages in the ram bulk stage, it can save a lot CPU cycles and
> > > reduce the network traffic significantly while speed up the live
> > > migration process obviously.
> > >
> > > This patch set is the QEMU side implementation.
> > >
> > > The virtio-balloon is extended so that QEMU can get the free pages
> > > information from the guest through virtio.
> > >
> > > After getting the free pages information (a bitmap), QEMU can use it
> > > to filter out the guest's free pages in the ram bulk stage. This
> > > make the live migration process much more efficient.
> >
> > Hi,
> >   An interesting solution; I know a few different people have been
> > looking at how to speed up ballooned VM migration.
> >
> >   I wonder if it would be possible to avoid the kernel changes by
> > parsing /proc/self/pagemap - if that can be used to detect
> > unmapped/zero mapped pages in the guest ram, would it achieve the
> same result?
> 
> Yes I was about to suggest the same thing: it's simple and makes use of the
> existing infrastructure.  And you wouldn't need to care if the pages were
> unmapped by ballooning or anything else (alternative balloon
> implementations, not yet touched by the guest, etc.).  Besides, you wouldn't
> need to synchronize with the guest.
> 
> Roman.

The unmapped/zero mapped pages can be detected by parsing /proc/self/pagemap,
but the free pages can't be detected by this. Imaging an application allocates a large amount
of memory , after using, it frees the memory, then live migration happens. All these free pages
will be process and sent to the destination, it's not optimal.

Liang

^ permalink raw reply

* Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Roman Kagan @ 2016-03-04  8:14 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	mst@redhat.com, linux-kernel@vger.kernel.org,
	Dr. David Alan Gilbert, linux-mm@kvack.org, amit.shah@redhat.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E03770E33@SHSMSX101.ccr.corp.intel.com>

On Fri, Mar 04, 2016 at 01:52:53AM +0000, Li, Liang Z wrote:
> >   I wonder if it would be possible to avoid the kernel changes by parsing
> > /proc/self/pagemap - if that can be used to detect unmapped/zero mapped
> > pages in the guest ram, would it achieve the same result?
> 
> Only detect the unmapped/zero mapped pages is not enough. Consider the 
> situation like case 2, it can't achieve the same result.

Your case 2 doesn't exist in the real world.  If people could stop their
main memory consumer in the guest prior to migration they wouldn't need
live migration at all.

I tend to think you can safely assume there's no free memory in the
guest, so there's little point optimizing for it.

OTOH it makes perfect sense optimizing for the unmapped memory that's
made up, in particular, by the ballon, and consider inflating the
balloon right before migration unless you already maintain it at the
optimal size for other reasons (like e.g. a global resource manager
optimizing the VM density).

Roman.

^ permalink raw reply

* Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
From: Roman Kagan @ 2016-03-04  7:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: ehabkost, kvm, Liang Li, linux-kernel, qemu-devel, linux-mm, mst,
	amit.shah, pbonzini, akpm, virtualization, rth
In-Reply-To: <20160303174615.GF2115@work-vm>

On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert wrote:
> * Liang Li (liang.z.li@intel.com) wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> > 
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free
> > pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> > the network traffic significantly while speed up the live migration
> > process obviously.
> > 
> > This patch set is the QEMU side implementation.
> > 
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> > 
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> 
> Hi,
>   An interesting solution; I know a few different people have been looking
> at how to speed up ballooned VM migration.
> 
>   I wonder if it would be possible to avoid the kernel changes by
> parsing /proc/self/pagemap - if that can be used to detect unmapped/zero
> mapped pages in the guest ram, would it achieve the same result?

Yes I was about to suggest the same thing: it's simple and makes use of
the existing infrastructure.  And you wouldn't need to care if the pages
were unmapped by ballooning or anything else (alternative balloon
implementations, not yet touched by the guest, etc.).  Besides, you
wouldn't need to synchronize with the guest.

Roman.

^ permalink raw reply

* RE: [Qemu-devel] [RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage
From: Li, Liang Z @ 2016-03-04  2:43 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, mst@redhat.com,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	linux-mm@kvack.org, amit.shah@redhat.com, pbonzini@redhat.com,
	akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, dgilbert@redhat.com,
	rth@twiddle.net
In-Reply-To: <20160303124520.GE32270@redhat.com>

> On Thu, Mar 03, 2016 at 06:44:28PM +0800, Liang Li wrote:
> > Get the free pages information through virtio and filter out the free
> > pages in the ram bulk stage. This can significantly reduce the total
> > live migration time as well as network traffic.
> >
> > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > ---
> >  migration/ram.c | 52
> > ++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 46 insertions(+), 6 deletions(-)
> 
> > @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void
> *opaque)
> >                                              DIRTY_MEMORY_MIGRATION);
> >      }
> >      memory_global_dirty_log_start();
> > +
> > +    if (balloon_free_pages_support() &&
> > +        balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> > +                               &free_pages_count) == 0) {
> > +        qemu_mutex_unlock_iothread();
> > +        while (balloon_get_free_pages(migration_bitmap_rcu-
> >free_pages_bmap,
> > +                                      &free_pages_count) == 0) {
> > +            usleep(1000);
> > +        }
> > +        qemu_mutex_lock_iothread();
> > +
> > +        filter_out_guest_free_pages(migration_bitmap_rcu-
> >free_pages_bmap);
> > +    }
> 
> IIUC, this code is synchronous wrt to the guest OS balloon drive. ie it is asking
> the geust for free pages and waiting for a response. If the guest OS has
> crashed this is going to mean QEMU waits forever and thus migration won't
> complete. Similarly you need to consider that the guest OS may be malicious
> and simply never respond.
> 
> So if the migration code is going to use the guest balloon driver to get info
> about free pages it has to be done in an asynchronous manner so that
> migration can never be stalled by a slow/crashed/malicious guest driver.
> 
> Regards,
> Daniel

Really,  thanks a lot!

Liang

^ permalink raw reply

* RE: [RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device
From: Li, Liang Z @ 2016-03-04  2:38 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, mst@redhat.com,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	linux-mm@kvack.org, amit.shah@redhat.com, pbonzini@redhat.com,
	akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, dgilbert@redhat.com,
	rth@twiddle.net
In-Reply-To: <20160303132334.5e4565df.cornelia.huck@de.ibm.com>

> On Thu,  3 Mar 2016 18:44:26 +0800
> Liang Li <liang.z.li@intel.com> wrote:
> 
> > Extend the virtio balloon device to support a new feature, this new
> > feature can help to get guest's free pages information, which can be
> > used for live migration optimzation.
> 
> Do you have a spec for this, e.g. as a patch to the virtio spec?

Not yet.
> 
> >
> > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > ---
> >  balloon.c                                       | 30 ++++++++-
> >  hw/virtio/virtio-balloon.c                      | 81 ++++++++++++++++++++++++-
> >  include/hw/virtio/virtio-balloon.h              | 17 +++++-
> >  include/standard-headers/linux/virtio_balloon.h |  1 +
> >  include/sysemu/balloon.h                        | 10 ++-
> >  5 files changed, 134 insertions(+), 5 deletions(-)
> 
> > +static int virtio_balloon_free_pages(void *opaque,
> > +                                     unsigned long *free_pages_bitmap,
> > +                                     unsigned long *free_pages_count)
> > +{
> > +    VirtIOBalloon *s = opaque;
> > +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > +    VirtQueueElement *elem = s->free_pages_vq_elem;
> > +    int len;
> > +
> > +    if (!balloon_free_pages_supported(s)) {
> > +        return -1;
> > +    }
> > +
> > +    if (s->req_status == NOT_STARTED) {
> > +        s->free_pages_bitmap = free_pages_bitmap;
> > +        s->req_status = STARTED;
> > +        s->mem_layout.low_mem =
> > + pc_get_lowmem(PC_MACHINE(current_machine));
> 
> Please don't leak pc-specific information into generic code.

I have already notice that and just leave it here in this  initial RFC version,  
the hard part of this solution is how to handle different architecture ...

Thanks!

Liang

^ permalink raw reply

* RE: [RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage
From: Li, Liang Z @ 2016-03-04  2:32 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org, mst@redhat.com,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	linux-mm@kvack.org, amit.shah@redhat.com, pbonzini@redhat.com,
	akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, dgilbert@redhat.com,
	rth@twiddle.net
In-Reply-To: <20160303131616.753f1de5.cornelia.huck@de.ibm.com>

> On Thu,  3 Mar 2016 18:44:28 +0800
> Liang Li <liang.z.li@intel.com> wrote:
> 
> > Get the free pages information through virtio and filter out the free
> > pages in the ram bulk stage. This can significantly reduce the total
> > live migration time as well as network traffic.
> >
> > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > ---
> >  migration/ram.c | 52
> > ++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 46 insertions(+), 6 deletions(-)
> >
> 
> > @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void
> *opaque)
> >                                              DIRTY_MEMORY_MIGRATION);
> >      }
> >      memory_global_dirty_log_start();
> > +
> > +    if (balloon_free_pages_support() &&
> > +        balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> > +                               &free_pages_count) == 0) {
> > +        qemu_mutex_unlock_iothread();
> > +        while (balloon_get_free_pages(migration_bitmap_rcu-
> >free_pages_bmap,
> > +                                      &free_pages_count) == 0) {
> > +            usleep(1000);
> > +        }
> > +        qemu_mutex_lock_iothread();
> > +
> > +
> > + filter_out_guest_free_pages(migration_bitmap_rcu-
> >free_pages_bmap);
> 
> A general comment: Using the ballooner to get information about pages that
> can be filtered out is too limited (there may be other ways to do this; we
> might be able to use cmma on s390, for example), and I don't like hardcoding
> to a specific method.
> 
> What about the reverse approach: Code may register a handler that
> populates the free_pages_bitmap which is called during this stage?

Good suggestion, thanks!

Liang
> <I like the idea of filtering in general, but I haven't looked at the code yet>
> 

^ permalink raw reply

* RE: [RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device
From: Li, Liang Z @ 2016-03-04  2:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: ehabkost@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	linux-mm@kvack.org, amit.shah@redhat.com, pbonzini@redhat.com,
	akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, dgilbert@redhat.com,
	rth@twiddle.net
In-Reply-To: <20160303125651.GA21382@redhat.com>

> Subject: Re: [RFC qemu 2/4] virtio-balloon: Add a new feature to balloon
> device
> 
> On Thu, Mar 03, 2016 at 06:44:26PM +0800, Liang Li wrote:
> > Extend the virtio balloon device to support a new feature, this new
> > feature can help to get guest's free pages information, which can be
> > used for live migration optimzation.
> >
> > Signed-off-by: Liang Li <liang.z.li@intel.com>
> 
> I don't understand why we need a new interface.
> Balloon already sends free pages to host.
> Just teach host to skip these pages.
> 

I just make use the current virtio-balloon implementation,  it's more complicated to
invent a new virtio-io device...
Actually, there is no need to inflate the balloon before live migration, so the host has
no information about the guest's free pages, that's why I add a new one.

> Maybe instead of starting with code, you should send a high level description
> to the virtio tc for consideration?
> 
> You can do it through the mailing list or using the web form:
> http://www.oasis-
> open.org/committees/comments/form.php?wg_abbrev=virtio
> 

Thanks for your information and suggestion.

Liang

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox