* RE: [PATCH V2 net-next 2/6] sctp: Handle sctp packets with CHECKSUM_PARTIAL
From: David Laight @ 2018-08-20 15:39 UTC (permalink / raw)
To: 'Marcelo Ricardo Leitner', Vladislav Yasevich
Cc: virtio-dev@lists.oasis-open.org, nhorman@tuxdriver.com,
mst@redhat.com, netdev@vger.kernel.org,
virtualization@lists.linux-foundation.org,
linux-sctp@vger.kernel.org
In-Reply-To: <20180820145415.GA5310@localhost.localdomain>
From: Marcelo Ricardo Leitner
> Sent: 20 August 2018 15:54
> On Wed, May 02, 2018 at 11:38:24AM -0300, Marcelo Ricardo Leitner wrote:
> > On Tue, May 01, 2018 at 10:07:35PM -0400, Vladislav Yasevich wrote:
> > > With SCTP checksum offload available in virtio, it is now
> > > possible for virtio to receive a sctp packet with CHECKSUM_PARTIAL
> > > set (guest-to-guest traffic). SCTP doesn't really have a
> > > partial checksum like TCP does, because CRC32c can't do partial
> > > additive checksumming.
...
Actually that isn't entirely true.
For all crc, crc(a) ^ crc(b) == crc(a^b).
Since crc(0) == 0 you can xor together two separately calculated crc
provided they both end at the same point.
The slight problem is that you are more likely to be appending
one buffer to another - which requires appending lots of zero
bytes to one of the crcs.
This could be speeded up by using lookup tables that add moderate
sized blocks of zero bytes to a crc instead of adding the zero
bytes one at a time.
Doing it without large const data and/or data cache trashing
is left as an exercise to the implementer.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* Re: [PATCH net-next v8 0/7] net: vhost: improve performance when enable busyloop
From: Michael S. Tsirkin @ 2018-08-20 20:34 UTC (permalink / raw)
To: xiangxia.m.yue; +Cc: netdev, virtualization
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>
On Sun, Aug 19, 2018 at 05:11:19AM -0700, xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> This patches improve the guest receive performance.
> On the handle_tx side, we poll the sock receive queue
> at the same time. handle_rx do that in the same way.
>
> For more performance report, see patch 4, 6, 7
Thanks for the patches. I'm traveling this week,
will do my best to review next week.
> Tonghao Zhang (7):
> net: vhost: lock the vqs one by one
> net: vhost: replace magic number of lock annotation
> net: vhost: factor out busy polling logic to vhost_net_busy_poll()
> net: vhost: add rx busy polling in tx path
> net: vhost: introduce bitmap for vhost_poll
> net: vhost: disable rx wakeup during tx busypoll
> net: vhost: make busyloop_intr more accurate
>
> drivers/vhost/net.c | 169 +++++++++++++++++++++++++++++++-------------------
> drivers/vhost/vhost.c | 41 ++++++------
> drivers/vhost/vhost.h | 7 ++-
> 3 files changed, 133 insertions(+), 84 deletions(-)
>
> --
> 1.8.3.1
^ permalink raw reply
* Re: [PATCH net-next v8 7/7] net: vhost: make busyloop_intr more accurate
From: Jason Wang @ 2018-08-21 0:33 UTC (permalink / raw)
To: xiangxia.m.yue, mst, makita.toshiaki; +Cc: netdev, virtualization
In-Reply-To: <1534680686-3108-8-git-send-email-xiangxia.m.yue@gmail.com>
On 2018年08月19日 20:11, xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> The patch uses vhost_has_work_pending() to check if
> the specified handler is scheduled, because in the most case,
> vhost_has_work() return true when other side handler is added
> to worker list. Use the vhost_has_work_pending() insead of
> vhost_has_work().
>
> Topology:
> [Host] ->linux bridge -> tap vhost-net ->[Guest]
>
> TCP_STREAM (netperf):
> * Without the patch: 38035.39 Mbps, 3.37 us mean latency
> * With the patch: 38409.44 Mbps, 3.34 us mean latency
The improvement is not obvious as last version. Do you imply there's
some recent changes of vhost that make it faster?
Thanks
>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> ---
> drivers/vhost/net.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index db63ae2..b6939ef 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -487,10 +487,8 @@ static void vhost_net_busy_poll(struct vhost_net *net,
> endtime = busy_clock() + busyloop_timeout;
>
> while (vhost_can_busy_poll(endtime)) {
> - if (vhost_has_work(&net->dev)) {
> - *busyloop_intr = true;
> + if (vhost_has_work(&net->dev))
> break;
> - }
>
> if ((sock_has_rx_data(sock) &&
> !vhost_vq_avail_empty(&net->dev, rvq)) ||
> @@ -513,6 +511,11 @@ static void vhost_net_busy_poll(struct vhost_net *net,
> !vhost_has_work_pending(&net->dev, VHOST_NET_VQ_RX))
> vhost_net_enable_vq(net, rvq);
>
> + if (vhost_has_work_pending(&net->dev,
> + poll_rx ?
> + VHOST_NET_VQ_RX: VHOST_NET_VQ_TX))
> + *busyloop_intr = true;
> +
> mutex_unlock(&vq->mutex);
> }
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v8 5/7] net: vhost: introduce bitmap for vhost_poll
From: Jason Wang @ 2018-08-21 0:45 UTC (permalink / raw)
To: xiangxia.m.yue, mst, makita.toshiaki; +Cc: netdev, virtualization
In-Reply-To: <1534680686-3108-6-git-send-email-xiangxia.m.yue@gmail.com>
On 2018年08月19日 20:11, xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> The bitmap of vhost_dev can help us to check if the
> specified poll is scheduled. This patch will be used
> for next two patches.
>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> ---
> drivers/vhost/net.c | 11 +++++++++--
> drivers/vhost/vhost.c | 17 +++++++++++++++--
> drivers/vhost/vhost.h | 7 ++++++-
> 3 files changed, 30 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 1eff72d..23d7ffc 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1135,8 +1135,15 @@ static int vhost_net_open(struct inode *inode, struct file *f)
> }
> vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
>
> - vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev);
> - vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev);
> + vhost_poll_init(n->poll + VHOST_NET_VQ_TX,
> + handle_tx_net,
> + VHOST_NET_VQ_TX,
> + EPOLLOUT, dev);
> +
> + vhost_poll_init(n->poll + VHOST_NET_VQ_RX,
> + handle_rx_net,
> + VHOST_NET_VQ_RX,
> + EPOLLIN, dev);
>
> f->private_data = n;
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index a1c06e7..dc88a60 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -186,7 +186,7 @@ void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
>
> /* Init poll structure */
> void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> - __poll_t mask, struct vhost_dev *dev)
> + __u8 poll_id, __poll_t mask, struct vhost_dev *dev)
> {
> init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
> init_poll_funcptr(&poll->table, vhost_poll_func);
> @@ -194,6 +194,7 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> poll->dev = dev;
> poll->wqh = NULL;
>
> + poll->poll_id = poll_id;
> vhost_work_init(&poll->work, fn);
> }
> EXPORT_SYMBOL_GPL(vhost_poll_init);
> @@ -276,8 +277,16 @@ bool vhost_has_work(struct vhost_dev *dev)
> }
> EXPORT_SYMBOL_GPL(vhost_has_work);
>
> +bool vhost_has_work_pending(struct vhost_dev *dev, int poll_id)
> +{
> + return !llist_empty(&dev->work_list) &&
> + test_bit(poll_id, dev->work_pending);
I think we've already had something similar. E.g can we test
VHOST_WORK_QUEUED instead?
Thanks
> +}
> +EXPORT_SYMBOL_GPL(vhost_has_work_pending);
> +
> void vhost_poll_queue(struct vhost_poll *poll)
> {
> + set_bit(poll->poll_id, poll->dev->work_pending);
> vhost_work_queue(poll->dev, &poll->work);
> }
> EXPORT_SYMBOL_GPL(vhost_poll_queue);
> @@ -354,6 +363,7 @@ static int vhost_worker(void *data)
> if (!node)
> schedule();
>
> + bitmap_zero(dev->work_pending, VHOST_DEV_MAX_VQ);
> node = llist_reverse_order(node);
> /* make sure flag is seen after deletion */
> smp_wmb();
> @@ -420,6 +430,8 @@ void vhost_dev_init(struct vhost_dev *dev,
> struct vhost_virtqueue *vq;
> int i;
>
> + BUG_ON(nvqs > VHOST_DEV_MAX_VQ);
> +
> dev->vqs = vqs;
> dev->nvqs = nvqs;
> mutex_init(&dev->mutex);
> @@ -428,6 +440,7 @@ void vhost_dev_init(struct vhost_dev *dev,
> dev->iotlb = NULL;
> dev->mm = NULL;
> dev->worker = NULL;
> + bitmap_zero(dev->work_pending, VHOST_DEV_MAX_VQ);
> init_llist_head(&dev->work_list);
> init_waitqueue_head(&dev->wait);
> INIT_LIST_HEAD(&dev->read_list);
> @@ -445,7 +458,7 @@ void vhost_dev_init(struct vhost_dev *dev,
> vhost_vq_reset(dev, vq);
> if (vq->handle_kick)
> vhost_poll_init(&vq->poll, vq->handle_kick,
> - EPOLLIN, dev);
> + i, EPOLLIN, dev);
> }
> }
> EXPORT_SYMBOL_GPL(vhost_dev_init);
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 6c844b9..60b6f6d 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -30,6 +30,7 @@ struct vhost_poll {
> wait_queue_head_t *wqh;
> wait_queue_entry_t wait;
> struct vhost_work work;
> + __u8 poll_id;
> __poll_t mask;
> struct vhost_dev *dev;
> };
> @@ -37,9 +38,10 @@ struct vhost_poll {
> void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
> void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
> bool vhost_has_work(struct vhost_dev *dev);
> +bool vhost_has_work_pending(struct vhost_dev *dev, int poll_id);
>
> void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> - __poll_t mask, struct vhost_dev *dev);
> + __u8 id, __poll_t mask, struct vhost_dev *dev);
> int vhost_poll_start(struct vhost_poll *poll, struct file *file);
> void vhost_poll_stop(struct vhost_poll *poll);
> void vhost_poll_flush(struct vhost_poll *poll);
> @@ -152,6 +154,8 @@ struct vhost_msg_node {
> struct list_head node;
> };
>
> +#define VHOST_DEV_MAX_VQ 128
> +
> struct vhost_dev {
> struct mm_struct *mm;
> struct mutex mutex;
> @@ -159,6 +163,7 @@ struct vhost_dev {
> int nvqs;
> struct eventfd_ctx *log_ctx;
> struct llist_head work_list;
> + DECLARE_BITMAP(work_pending, VHOST_DEV_MAX_VQ);
> struct task_struct *worker;
> struct vhost_umem *umem;
> struct vhost_umem *iotlb;
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v8 7/7] net: vhost: make busyloop_intr more accurate
From: Jason Wang @ 2018-08-21 0:47 UTC (permalink / raw)
To: xiangxia.m.yue, mst, makita.toshiaki; +Cc: netdev, virtualization
In-Reply-To: <f85bfa97-ab9c-2d51-2053-1fe6bb3d45bc@redhat.com>
On 2018年08月21日 08:33, Jason Wang wrote:
>
>
> On 2018年08月19日 20:11, xiangxia.m.yue@gmail.com wrote:
>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>
>> The patch uses vhost_has_work_pending() to check if
>> the specified handler is scheduled, because in the most case,
>> vhost_has_work() return true when other side handler is added
>> to worker list. Use the vhost_has_work_pending() insead of
>> vhost_has_work().
>>
>> Topology:
>> [Host] ->linux bridge -> tap vhost-net ->[Guest]
>>
>> TCP_STREAM (netperf):
>> * Without the patch: 38035.39 Mbps, 3.37 us mean latency
>> * With the patch: 38409.44 Mbps, 3.34 us mean latency
>
> The improvement is not obvious as last version. Do you imply there's
> some recent changes of vhost that make it faster?
>
I misunderstood the numbers, please ignore this.
It shows less than 1% improvement. I'm not sure it's worth to do so. Can
you try bi-directional pktgen to see if it has more obvious effect?
Thanks
> Thanks
>
>>
>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>> ---
>> drivers/vhost/net.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index db63ae2..b6939ef 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -487,10 +487,8 @@ static void vhost_net_busy_poll(struct vhost_net
>> *net,
>> endtime = busy_clock() + busyloop_timeout;
>> while (vhost_can_busy_poll(endtime)) {
>> - if (vhost_has_work(&net->dev)) {
>> - *busyloop_intr = true;
>> + if (vhost_has_work(&net->dev))
>> break;
>> - }
>> if ((sock_has_rx_data(sock) &&
>> !vhost_vq_avail_empty(&net->dev, rvq)) ||
>> @@ -513,6 +511,11 @@ static void vhost_net_busy_poll(struct vhost_net
>> *net,
>> !vhost_has_work_pending(&net->dev, VHOST_NET_VQ_RX))
>> vhost_net_enable_vq(net, rvq);
>> + if (vhost_has_work_pending(&net->dev,
>> + poll_rx ?
>> + VHOST_NET_VQ_RX: VHOST_NET_VQ_TX))
>> + *busyloop_intr = true;
>> +
>> mutex_unlock(&vq->mutex);
>> }
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v8 3/7] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Jason Wang @ 2018-08-21 3:15 UTC (permalink / raw)
To: xiangxia.m.yue, mst, makita.toshiaki; +Cc: netdev, virtualization
In-Reply-To: <1534680686-3108-4-git-send-email-xiangxia.m.yue@gmail.com>
On 2018年08月19日 20:11, xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> Factor out generic busy polling logic and will be
> used for in tx path in the next patch. And with the patch,
> qemu can set differently the busyloop_timeout for rx queue.
>
> To avoid duplicate codes, introduce the helper functions:
> * sock_has_rx_data(changed from sk_has_rx_data)
> * vhost_net_busy_poll_try_queue
>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> ---
> drivers/vhost/net.c | 111 +++++++++++++++++++++++++++++++++-------------------
> 1 file changed, 71 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 32c1b52..453c061 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -440,6 +440,75 @@ static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq)
> nvq->done_idx = 0;
> }
>
> +static int sock_has_rx_data(struct socket *sock)
> +{
> + if (unlikely(!sock))
> + return 0;
> +
> + if (sock->ops->peek_len)
> + return sock->ops->peek_len(sock);
> +
> + return skb_queue_empty(&sock->sk->sk_receive_queue);
> +}
> +
> +static void vhost_net_busy_poll_try_queue(struct vhost_net *net,
> + struct vhost_virtqueue *vq)
> +{
> + if (!vhost_vq_avail_empty(&net->dev, vq)) {
> + vhost_poll_queue(&vq->poll);
> + } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> + vhost_disable_notify(&net->dev, vq);
> + vhost_poll_queue(&vq->poll);
> + }
> +}
> +
> +static void vhost_net_busy_poll(struct vhost_net *net,
> + struct vhost_virtqueue *rvq,
> + struct vhost_virtqueue *tvq,
> + bool *busyloop_intr,
> + bool poll_rx)
> +{
> + unsigned long busyloop_timeout;
> + unsigned long endtime;
> + struct socket *sock;
> + struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
> +
> + mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
> + vhost_disable_notify(&net->dev, vq);
> + sock = rvq->private_data;
> +
> + busyloop_timeout = poll_rx ? rvq->busyloop_timeout:
> + tvq->busyloop_timeout;
> +
> + preempt_disable();
> + endtime = busy_clock() + busyloop_timeout;
> +
> + while (vhost_can_busy_poll(endtime)) {
> + if (vhost_has_work(&net->dev)) {
> + *busyloop_intr = true;
> + break;
> + }
> +
> + if ((sock_has_rx_data(sock) &&
> + !vhost_vq_avail_empty(&net->dev, rvq)) ||
> + !vhost_vq_avail_empty(&net->dev, tvq))
> + break;
> +
> + cpu_relax();
> + }
> +
> + preempt_enable();
> +
> + if (poll_rx)
> + vhost_net_busy_poll_try_queue(net, tvq);
> + else if (sock_has_rx_data(sock))
> + vhost_net_busy_poll_try_queue(net, rvq);
This could be simplified like:
if (poll_rx || sock_has_rx_data(sock))
vhost_net_busy_poll_try_queue(net, vq);
Thanks
> + else /* On tx here, sock has no rx data. */
> + vhost_enable_notify(&net->dev, rvq);
> +
> + mutex_unlock(&vq->mutex);
> +}
> +
> static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> struct vhost_net_virtqueue *nvq,
> unsigned int *out_num, unsigned int *in_num,
> @@ -753,16 +822,6 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
> return len;
> }
>
> -static int sk_has_rx_data(struct sock *sk)
> -{
> - struct socket *sock = sk->sk_socket;
> -
> - if (sock->ops->peek_len)
> - return sock->ops->peek_len(sock);
> -
> - return skb_queue_empty(&sk->sk_receive_queue);
> -}
> -
> static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
> bool *busyloop_intr)
> {
> @@ -770,41 +829,13 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
> struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
> struct vhost_virtqueue *rvq = &rnvq->vq;
> struct vhost_virtqueue *tvq = &tnvq->vq;
> - unsigned long uninitialized_var(endtime);
> int len = peek_head_len(rnvq, sk);
>
> - if (!len && tvq->busyloop_timeout) {
> + if (!len && rvq->busyloop_timeout) {
> /* Flush batched heads first */
> vhost_net_signal_used(rnvq);
> /* Both tx vq and rx socket were polled here */
> - mutex_lock_nested(&tvq->mutex, VHOST_NET_VQ_TX);
> - vhost_disable_notify(&net->dev, tvq);
> -
> - preempt_disable();
> - endtime = busy_clock() + tvq->busyloop_timeout;
> -
> - while (vhost_can_busy_poll(endtime)) {
> - if (vhost_has_work(&net->dev)) {
> - *busyloop_intr = true;
> - break;
> - }
> - if ((sk_has_rx_data(sk) &&
> - !vhost_vq_avail_empty(&net->dev, rvq)) ||
> - !vhost_vq_avail_empty(&net->dev, tvq))
> - break;
> - cpu_relax();
> - }
> -
> - preempt_enable();
> -
> - if (!vhost_vq_avail_empty(&net->dev, tvq)) {
> - vhost_poll_queue(&tvq->poll);
> - } else if (unlikely(vhost_enable_notify(&net->dev, tvq))) {
> - vhost_disable_notify(&net->dev, tvq);
> - vhost_poll_queue(&tvq->poll);
> - }
> -
> - mutex_unlock(&tvq->mutex);
> + vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, true);
>
> len = peek_head_len(rnvq, sk);
> }
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* IEEE Record # 41985: 2018 3rd International Conference on Contemporary Computing and Informatics (IC3I).
From: Dr. S K Niranjan Aradhya @ 2018-08-22 4:34 UTC (permalink / raw)
To: virtualization
[-- Attachment #1.1: Type: text/plain, Size: 1581 bytes --]
*<< Apologies for cross-postings >><<< Please circulate among your friends,
peers and researchers >>>*
IEEE Conference Record No.: #41985;
2018 3rd International Conference on Contemporary Computing and Informatics
(IC3I).
Conference Date : 10 - 12 October 2018
Submission Deadline: 1 September 2018
*Submission Link:http://cmsweb.com.sg/ic3i18/index.php/ic3i18/ic3i18/login
<http://cmsweb.com.sg/ic3i18/index.php/ic3i18/ic3i18/login>*
IEEE ISBN : 978-1-5386-6894-8
IEEE Part No. : CFP18AWQ-ART
Selected, accepted and extended paper will be published in Scopus Indexed
International Journal of Forensic Software Engineering published by
InderScience
All accepted and presented papers will be submitted to the IEEE for
possible publication in IEEE Xplore Digital Library. Previous edition
indexed in: SCOPUS, ISI Web of Science, Engineering Index, Google, etc.
If you like to join the TPC or propose a special session or symposiums
please write to: secretariat@ic3i.org
General Chair(s)
IC3I 2018 Conference
----------------------
Disclaimer: We have clearly mentioned the subject lines and your email
address won't be misleading in any form. We have found your mail address
through our own efforts on the web search and not through any illegal way.
If you wish to remove your information from our mailing list or no longer
receive future announcements, please email with REMOVE in subject. Your
request to opt-out will be effective within a reasonable amount of time.
ic3i-cfp.pdf
<https://drive.google.com/file/d/1Tg7HlYL-7wwm0RB6BHrU0RqBaU_ol34F/view?usp=drive_web>
[-- Attachment #1.2: Type: text/html, Size: 2604 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH] vhost/scsi: increase VHOST_SCSI_PREALLOC_PROT_SGLS to 2048
From: Greg Edwards @ 2018-08-22 17:25 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: pbonzini, virtualization
In-Reply-To: <20180808233807-mutt-send-email-mst@kernel.org>
On Wed, Aug 08, 2018 at 11:42:22PM +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 08, 2018 at 01:29:55PM -0600, Greg Edwards wrote:
>> The current value of VHOST_SCSI_PREALLOC_PROT_SGLS is too small to
>> accommodate larger I/Os, e.g. 16-32 MiB, when the VIRTIO_SCSI_F_T10_PI
>> feature bit is negotiated and the backing store supports T10 PI.
>>
>> vhost-scsi rejects the command with errors like:
>>
>> [ 59.581317] vhost_scsi_calc_sgls: requested sgl_count: 1820 exceeds pre-allocated max_sgls: 512
>>
>> Signed-off-by: Greg Edwards <gedwards@ddn.com>
>> ---
>> drivers/vhost/scsi.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
>> index 17fcd3b2e686..8c32cf58d6fa 100644
>> --- a/drivers/vhost/scsi.c
>> +++ b/drivers/vhost/scsi.c
>> @@ -56,7 +56,7 @@
>> #define VHOST_SCSI_DEFAULT_TAGS 256
>> #define VHOST_SCSI_PREALLOC_SGLS 2048
>> #define VHOST_SCSI_PREALLOC_UPAGES 2048
>> -#define VHOST_SCSI_PREALLOC_PROT_SGLS 512
>> +#define VHOST_SCSI_PREALLOC_PROT_SGLS 2048
>>
>> struct vhost_scsi_inflight {
>> /* Wait for the flush operation to finish */
>
> I guess it's ok since PREALLOC_SGLS is already 2K ... or
> am I missing something. Paolo, any input on this?
Ignore this patch. I believe I've identified the root cause [1], and
will send out a new patch once I've done some more testing.
Greg
[1] https://www.spinics.net/lists/linux-scsi/msg122825.html
^ permalink raw reply
* [PATCH] vhost/scsi: truncate T10 PI iov_iter to prot_bytes
From: Greg Edwards @ 2018-08-22 19:21 UTC (permalink / raw)
To: virtualization; +Cc: Michael S. Tsirkin, Mike Christie, Paolo Bonzini
Commands with protection information included were not truncating the
protection iov_iter to the number of protection bytes in the command.
This resulted in vhost_scsi mis-calculating the size of the protection
SGL in vhost_scsi_calc_sgls(), and including both the protection and
data SG entries in the protection SGL.
Fixes: 09b13fa8c1a1 ("vhost/scsi: Add ANY_LAYOUT support in vhost_scsi_handle_vq")
Signed-off-by: Greg Edwards <gedwards@ddn.com>
---
drivers/vhost/scsi.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 76f8d649147b..cbe0ea26c1ff 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -964,7 +964,8 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
prot_bytes = vhost32_to_cpu(vq, v_req_pi.pi_bytesin);
}
/*
- * Set prot_iter to data_iter, and advance past any
+ * Set prot_iter to data_iter and truncate it to
+ * prot_bytes, and advance data_iter past any
* preceeding prot_bytes that may be present.
*
* Also fix up the exp_data_len to reflect only the
@@ -973,6 +974,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
if (prot_bytes) {
exp_data_len -= prot_bytes;
prot_iter = data_iter;
+ iov_iter_truncate(&prot_iter, prot_bytes);
iov_iter_advance(&data_iter, prot_bytes);
}
tag = vhost64_to_cpu(vq, v_req_pi.tag);
--
2.17.1
^ permalink raw reply related
* [PULL] virtio,vhost: fixes, tweaks
From: Michael S. Tsirkin @ 2018-08-23 17:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: peter.maydell, cdall, kvm, mst, jean-philippe.brucker, netdev,
penguin-kernel, linux-kernel, virtualization, marc.zyngier, akpm,
mhocko, suzuki.poulose
The following changes since commit 94710cac0ef4ee177a63b5227664b38c95bbf703:
Linux 4.18 (2018-08-12 13:41:04 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 864d39df09b43f9d09d80bc29d8e8888294b3c4b:
vhost/scsi: increase VHOST_SCSI_PREALLOC_PROT_SGLS to 2048 (2018-08-22 01:01:47 +0300)
----------------------------------------------------------------
virtio, vhost: fixes, tweaks
No new features but a bunch of tweaks such as
switching balloon from oom notifier to shrinker.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
----------------------------------------------------------------
Greg Edwards (2):
vhost: allow vhost-scsi driver to be built-in
vhost/scsi: increase VHOST_SCSI_PREALLOC_PROT_SGLS to 2048
Suzuki K Poulose (2):
virtio: mmio-v1: Validate queue PFN
virtio: pci-legacy: Validate queue pfn
Wei Wang (3):
virtio-balloon: remove BUG() in init_vqs
virtio-balloon: kzalloc the vb struct
virtio_balloon: replace oom notifier with shrinker
drivers/vhost/Kconfig | 2 +-
drivers/vhost/scsi.c | 2 +-
drivers/virtio/virtio_balloon.c | 125 ++++++++++++++++++++-----------------
drivers/virtio/virtio_mmio.c | 20 +++++-
drivers/virtio/virtio_pci_legacy.c | 14 ++++-
5 files changed, 99 insertions(+), 64 deletions(-)
^ permalink raw reply
* [PATCH net] vhost: correctly check the iova range when waking virtqueue
From: Jason Wang @ 2018-08-24 8:53 UTC (permalink / raw)
To: mst, jasowang; +Cc: netdev, linux-kernel, kvm, virtualization
We don't wakeup the virtqueue if the first byte of pending iova range
is the last byte of the range we just got updated. This will lead a
virtqueue to wait for IOTLB updating forever. Fixing by correct the
check and wake up the virtqueue in this case.
Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Reported-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
The patch is needed for -stable.
---
drivers/vhost/vhost.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 96c1d8400822..b13c6b4b2c66 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -952,7 +952,7 @@ static void vhost_iotlb_notify_vq(struct vhost_dev *d,
list_for_each_entry_safe(node, n, &d->pending_list, node) {
struct vhost_iotlb_msg *vq_msg = &node->msg.iotlb;
if (msg->iova <= vq_msg->iova &&
- msg->iova + msg->size - 1 > vq_msg->iova &&
+ msg->iova + msg->size - 1 >= vq_msg->iova &&
vq_msg->type == VHOST_IOTLB_MISS) {
vhost_poll_queue(&node->vq->poll);
list_del(&node->node);
--
2.17.1
^ permalink raw reply related
* Re: [PATCH net] vhost: correctly check the iova range when waking virtqueue
From: Peter Xu @ 2018-08-24 9:36 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <20180824085313.21798-1-jasowang@redhat.com>
On Fri, Aug 24, 2018 at 04:53:13PM +0800, Jason Wang wrote:
> We don't wakeup the virtqueue if the first byte of pending iova range
> is the last byte of the range we just got updated. This will lead a
> virtqueue to wait for IOTLB updating forever. Fixing by correct the
> check and wake up the virtqueue in this case.
>
> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> Reported-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Without this patch, this command will trigger the IO hang merely every
time from host to guest:
netperf -H 1.2.3.4 -l 5 -t TCP_RR -- -b 100
After applying, I can run it 10 times continuously without a problem.
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Thanks,
--
Peter Xu
^ permalink raw reply
* Re: [PATCH net] vhost: correctly check the iova range when waking virtqueue
From: Michael S. Tsirkin @ 2018-08-24 11:02 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180824085313.21798-1-jasowang@redhat.com>
On Fri, Aug 24, 2018 at 04:53:13PM +0800, Jason Wang wrote:
> We don't wakeup the virtqueue if the first byte of pending iova range
> is the last byte of the range we just got updated. This will lead a
> virtqueue to wait for IOTLB updating forever. Fixing by correct the
> check and wake up the virtqueue in this case.
>
> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> Reported-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> The patch is needed for -stable.
> ---
> drivers/vhost/vhost.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 96c1d8400822..b13c6b4b2c66 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -952,7 +952,7 @@ static void vhost_iotlb_notify_vq(struct vhost_dev *d,
> list_for_each_entry_safe(node, n, &d->pending_list, node) {
> struct vhost_iotlb_msg *vq_msg = &node->msg.iotlb;
> if (msg->iova <= vq_msg->iova &&
> - msg->iova + msg->size - 1 > vq_msg->iova &&
> + msg->iova + msg->size - 1 >= vq_msg->iova &&
> vq_msg->type == VHOST_IOTLB_MISS) {
> vhost_poll_queue(&node->vq->poll);
> list_del(&node->node);
> --
> 2.17.1
^ permalink raw reply
* Re: [PATCH v2 00/11] x86/paravirt: several cleanups
From: Juergen Gross @ 2018-08-24 13:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: boris.ostrovsky, rusty, peterz, mingo, hpa, akataria, tglx
In-Reply-To: <20180813073739.26108-1-jgross@suse.com>
On 13/08/18 09:37, Juergen Gross wrote:
> This series removes some no longer needed stuff from paravirt
> infrastructure and puts large quantities of paravirt ops under a new
> config option PARAVIRT_XXL which is selected by XEN_PV only.
>
> A pvops kernel without XEN_PV being configured is about 2.5% smaller
> with this series applied.
>
> tip commit 5800dc5c19f34e6e03b5adab1282535cb102fafd ("x86/paravirt:
> Fix spectre-v2 mitigations for paravirt guests") is a prerequisite
> for this series.
>
> The last 4 patches of this series require my Xen cleanup series
> https://lore.kernel.org/lkml/20180717120113.12756-1-jgross@suse.com/
> which hides more Xen PV-only code behind CONFIG_XEN_PV.
>
> Changes in V2:
> - patch 4: shorten pv_ops sub-structure names (Jan Beulich)
> - patch 11: new patch
>
> Juergen Gross (11):
> x86/paravirt: make paravirt_patch_call() and paravirt_patch_jmp()
> static
> x86/paravirt: remove clobbers parameter from paravirt patch functions
> x86/paravirt: remove clobbers from struct paravirt_patch_site
> x86/paravirt: use a single ops structure
> x86/paravirt: remove unused paravirt bits
> x86/paravirt: introduce new config option PARAVIRT_XXL
> x86/paravirt: move items in pv_info under PARAVIRT_XXL umbrella
> x86/paravirt: move the Xen-only pv_cpu_ops under the PARAVIRT_XXL
> umbrella
> x86/paravirt: move the Xen-only pv_irq_ops under the PARAVIRT_XXL
> umbrella
> x86/paravirt: move the Xen-only pv_mmu_ops under the PARAVIRT_XXL
> umbrella
> x86/paravirt: remove unneeded mmu related paravirt ops bits
>
> arch/x86/Kconfig | 3 +
> arch/x86/hyperv/mmu.c | 4 +-
> arch/x86/include/asm/debugreg.h | 2 +-
> arch/x86/include/asm/desc.h | 4 +-
> arch/x86/include/asm/fixmap.h | 2 +-
> arch/x86/include/asm/irqflags.h | 56 ++--
> arch/x86/include/asm/mmu_context.h | 4 +-
> arch/x86/include/asm/msr.h | 4 +-
> arch/x86/include/asm/paravirt.h | 399 +++++++++++++---------------
> arch/x86/include/asm/paravirt_types.h | 77 +++---
> arch/x86/include/asm/pgalloc.h | 2 +-
> arch/x86/include/asm/pgtable-3level_types.h | 2 +-
> arch/x86/include/asm/pgtable.h | 7 +-
> arch/x86/include/asm/processor.h | 4 +-
> arch/x86/include/asm/ptrace.h | 3 +-
> arch/x86/include/asm/segment.h | 2 +-
> arch/x86/include/asm/special_insns.h | 4 +-
> arch/x86/kernel/alternative.c | 2 +-
> arch/x86/kernel/asm-offsets.c | 13 +-
> arch/x86/kernel/asm-offsets_64.c | 9 +-
> arch/x86/kernel/cpu/common.c | 4 +-
> arch/x86/kernel/cpu/vmware.c | 4 +-
> arch/x86/kernel/head_64.S | 2 +-
> arch/x86/kernel/kvm.c | 17 +-
> arch/x86/kernel/kvmclock.c | 4 +-
> arch/x86/kernel/paravirt-spinlocks.c | 15 +-
> arch/x86/kernel/paravirt.c | 292 ++++++++++----------
> arch/x86/kernel/paravirt_patch_32.c | 57 ++--
> arch/x86/kernel/paravirt_patch_64.c | 65 ++---
> arch/x86/kernel/tsc.c | 2 +-
> arch/x86/kernel/vsmp_64.c | 24 +-
> arch/x86/xen/Kconfig | 1 +
> arch/x86/xen/enlighten_pv.c | 31 ++-
> arch/x86/xen/irq.c | 2 +-
> arch/x86/xen/mmu_hvm.c | 2 +-
> arch/x86/xen/mmu_pv.c | 28 +-
> arch/x86/xen/spinlock.c | 11 +-
> arch/x86/xen/time.c | 4 +-
> drivers/xen/time.c | 2 +-
> 39 files changed, 575 insertions(+), 595 deletions(-)
>
Ping?
Juergen
^ permalink raw reply
* Re: [PATCH v2 01/11] x86/paravirt: make paravirt_patch_call() and paravirt_patch_jmp() static
From: Thomas Gleixner @ 2018-08-24 14:00 UTC (permalink / raw)
To: Juergen Gross
Cc: rusty, peterz, x86, linux-kernel, virtualization, mingo, hpa,
xen-devel, akataria, boris.ostrovsky
In-Reply-To: <20180813073739.26108-2-jgross@suse.com>
On Mon, 13 Aug 2018, Juergen Gross wrote:
> paravirt_patch_call() and paravirt_patch_jmp() are used in paravirt.c
> only. Convert them to static.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
^ permalink raw reply
* Re: [PATCH v2 02/11] x86/paravirt: remove clobbers parameter from paravirt patch functions
From: Thomas Gleixner @ 2018-08-24 14:01 UTC (permalink / raw)
To: Juergen Gross
Cc: rusty, peterz, x86, linux-kernel, virtualization, mingo, hpa,
xen-devel, akataria, boris.ostrovsky
In-Reply-To: <20180813073739.26108-3-jgross@suse.com>
On Mon, 13 Aug 2018, Juergen Gross wrote:
> The clobbers parameter from paravirt_patch_default() et al isn't used
> any longer. Remove it.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
^ permalink raw reply
* Re: [PATCH v2 03/11] x86/paravirt: remove clobbers from struct paravirt_patch_site
From: Thomas Gleixner @ 2018-08-24 14:03 UTC (permalink / raw)
To: Juergen Gross
Cc: rusty, peterz, x86, linux-kernel, virtualization, mingo, hpa,
xen-devel, akataria, boris.ostrovsky
In-Reply-To: <20180813073739.26108-4-jgross@suse.com>
On Mon, 13 Aug 2018, Juergen Gross wrote:
> There is no need any longer to store the clobbers in struct
> paravirt_patch_site. Remove clobbers from the struct and from the
> related macros.
>
> While at it fix some lines longer than 80 characters.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
^ permalink raw reply
* Re: [PATCH v2 09/11] x86/paravirt: move the Xen-only pv_irq_ops under the PARAVIRT_XXL umbrella
From: Peter Zijlstra @ 2018-08-24 14:10 UTC (permalink / raw)
To: Juergen Gross
Cc: rusty, x86, linux-kernel, virtualization, mingo, tglx, hpa,
xen-devel, akataria, boris.ostrovsky
In-Reply-To: <20180813073739.26108-10-jgross@suse.com>
On Mon, Aug 13, 2018 at 09:37:37AM +0200, Juergen Gross wrote:
> Some of the paravirt ops defined in pv_irq_ops are for Xen PV guests
> only. Define them only if CONFIG_PARAVIRT_XXL is set.
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index e652ec27d945..ae53ee36d8fb 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -197,8 +197,10 @@ struct pv_irq_ops {
> struct paravirt_callee_save irq_disable;
> struct paravirt_callee_save irq_enable;
>
> +#ifdef CONFIG_PARAVIRT_XXL
> void (*safe_halt)(void);
> void (*halt)(void);
> +#endif
that makes me sad... but it appears VSMP also uses them. Can't you
simply make VSMP also select XXL, I don't think that's used quite as
much as Xen is :-)
^ permalink raw reply
* Re: [PATCH v2 10/11] x86/paravirt: move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
From: Peter Zijlstra @ 2018-08-24 14:12 UTC (permalink / raw)
To: Juergen Gross
Cc: rusty, x86, linux-kernel, virtualization, mingo, tglx, hpa,
xen-devel, akataria, boris.ostrovsky
In-Reply-To: <20180813073739.26108-11-jgross@suse.com>
On Mon, Aug 13, 2018 at 09:37:38AM +0200, Juergen Gross wrote:
> struct pv_mmu_ops {
> + /* TLB operations */
> + void (*flush_tlb_user)(void);
> + void (*flush_tlb_kernel)(void);
> + void (*flush_tlb_one_user)(unsigned long addr);
> + void (*flush_tlb_others)(const struct cpumask *cpus,
> + const struct flush_tlb_info *info);
> +
> + /* Hook for intercepting the destruction of an mm_struct. */
> + void (*exit_mmap)(struct mm_struct *mm);
Right, so I just wrecked that for you by adding a new:
tlb_remove_table virt function. But I don't suppose that's a difficult
thing to fix up.
^ permalink raw reply
* Re: [PATCH v2 00/11] x86/paravirt: several cleanups
From: Peter Zijlstra @ 2018-08-24 14:13 UTC (permalink / raw)
To: Juergen Gross
Cc: rusty, x86, linux-kernel, virtualization, mingo, tglx, hpa,
xen-devel, akataria, boris.ostrovsky
In-Reply-To: <45bfe8ab-683f-ab79-e3c6-c0a707b667c2@suse.com>
On Fri, Aug 24, 2018 at 03:52:55PM +0200, Juergen Gross wrote:
> Ping?
Looking good; although I messed it up a little bit by adding a new
paravirt function.
Thanks for doing this!
^ permalink raw reply
* Re: [PATCH v2 09/11] x86/paravirt: move the Xen-only pv_irq_ops under the PARAVIRT_XXL umbrella
From: Juergen Gross @ 2018-08-24 14:13 UTC (permalink / raw)
To: Peter Zijlstra
Cc: rusty, x86, linux-kernel, virtualization, mingo, tglx, hpa,
xen-devel, akataria, boris.ostrovsky
In-Reply-To: <20180824141045.GO24124@hirez.programming.kicks-ass.net>
On 24/08/18 16:10, Peter Zijlstra wrote:
> On Mon, Aug 13, 2018 at 09:37:37AM +0200, Juergen Gross wrote:
>> Some of the paravirt ops defined in pv_irq_ops are for Xen PV guests
>> only. Define them only if CONFIG_PARAVIRT_XXL is set.
>> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
>> index e652ec27d945..ae53ee36d8fb 100644
>> --- a/arch/x86/include/asm/paravirt_types.h
>> +++ b/arch/x86/include/asm/paravirt_types.h
>> @@ -197,8 +197,10 @@ struct pv_irq_ops {
>> struct paravirt_callee_save irq_disable;
>> struct paravirt_callee_save irq_enable;
>>
>> +#ifdef CONFIG_PARAVIRT_XXL
>> void (*safe_halt)(void);
>> void (*halt)(void);
>> +#endif
>
> that makes me sad... but it appears VSMP also uses them. Can't you
> simply make VSMP also select XXL, I don't think that's used quite as
> much as Xen is :-)
>
Sure, why not?
Any objections?
Juergen
^ permalink raw reply
* Re: [PATCH v2 10/11] x86/paravirt: move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
From: Juergen Gross @ 2018-08-24 14:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: rusty, x86, linux-kernel, virtualization, mingo, tglx, hpa,
xen-devel, akataria, boris.ostrovsky
In-Reply-To: <20180824141218.GP24124@hirez.programming.kicks-ass.net>
On 24/08/18 16:12, Peter Zijlstra wrote:
> On Mon, Aug 13, 2018 at 09:37:38AM +0200, Juergen Gross wrote:
>> struct pv_mmu_ops {
>> + /* TLB operations */
>> + void (*flush_tlb_user)(void);
>> + void (*flush_tlb_kernel)(void);
>> + void (*flush_tlb_one_user)(unsigned long addr);
>> + void (*flush_tlb_others)(const struct cpumask *cpus,
>> + const struct flush_tlb_info *info);
>> +
>> + /* Hook for intercepting the destruction of an mm_struct. */
>> + void (*exit_mmap)(struct mm_struct *mm);
>
> Right, so I just wrecked that for you by adding a new:
> tlb_remove_table virt function. But I don't suppose that's a difficult
> thing to fix up.
Right. This will stay outside of XXL, I think. :-)
Juergen
^ permalink raw reply
* Re: [PATCH net] vhost: correctly check the iova range when waking virtqueue
From: David Miller @ 2018-08-26 0:40 UTC (permalink / raw)
To: jasowang; +Cc: kvm, mst, netdev, linux-kernel, virtualization
In-Reply-To: <20180824085313.21798-1-jasowang@redhat.com>
From: Jason Wang <jasowang@redhat.com>
Date: Fri, 24 Aug 2018 16:53:13 +0800
> We don't wakeup the virtqueue if the first byte of pending iova range
> is the last byte of the range we just got updated. This will lead a
> virtqueue to wait for IOTLB updating forever. Fixing by correct the
> check and wake up the virtqueue in this case.
>
> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> Reported-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> The patch is needed for -stable.
Applied and queued up for -stable, thanks Jason.
^ permalink raw reply
* [PATCH v37 0/3] Virtio-balloon: support free page reporting
From: Wei Wang @ 2018-08-27 1:32 UTC (permalink / raw)
To: virtio-dev, linux-kernel, virtualization, kvm, linux-mm, mst,
mhocko, akpm, dgilbert
Cc: yang.zhang.wz, riel, quan.xu0, liliang.opensource, pbonzini,
nilal, torvalds
The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, implemented by this
series enables the virtio-balloon driver to report hints of guest free
pages to host. It can be used to accelerate virtual machine (VM) live
migration. Here is an introduction of this usage:
Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to have the hypervisor write-protect all the guest memory.
This feature enables the optimization by skipping the transfer of guest
free pages during VM live migration. It is not concerned that the memory
pages are used after they are given to the hypervisor as a hint of the
free pages, because they will be tracked by the hypervisor and transferred
in the subsequent round if they are used and written.
* Tests
1 Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms
2 Test Results (results are averaged over several repeated runs)
2.1 Guest setup: 8G RAM, 4 vCPU
2.1.1 Idle guest live migration time
Optimization v.s. Legacy = 620ms vs 2970ms
--> ~79% reduction
2.1.2 Guest live migration with Linux compilation workload
(i.e. make bzImage -j4) running
1) Live Migration Time:
Optimization v.s. Legacy = 2273ms v.s. 4502ms
--> ~50% reduction
2) Linux Compilation Time:
Optimization v.s. Legacy = 8min42s v.s. 8min43s
--> no obvious difference
2.2 Guest setup: 128G RAM, 4 vCPU
2.2.1 Idle guest live migration time
Optimization v.s. Legacy = 5294ms vs 41651ms
--> ~87% reduction
2.2.2 Guest live migration with Linux compilation workload
1) Live Migration Time:
Optimization v.s. Legacy = 8816ms v.s. 54201ms
--> 84% reduction
2) Linux Compilation Time:
Optimization v.s. Legacy = 8min30s v.s. 8min36s
--> no obvious difference
ChangeLog:
v36->v37:
- free the reported pages to mm when receives a DONE cmd from host.
Please see patch 1's commit log for reasons. Please see patch 1's
commit for detailed explanations.
For ChangeLogs from v22 to v36, please reference
https://lkml.org/lkml/2018/7/20/199
For ChangeLogs before v21, please reference
https://lwn.net/Articles/743660/
Wei Wang (3):
virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
mm/page_poison: expose page_poisoning_enabled to kernel modules
virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
drivers/virtio/virtio_balloon.c | 374 ++++++++++++++++++++++++++++++++----
include/uapi/linux/virtio_balloon.h | 8 +
mm/page_poison.c | 6 +
3 files changed, 355 insertions(+), 33 deletions(-)
--
2.7.4
^ permalink raw reply
* [PATCH v37 1/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
From: Wei Wang @ 2018-08-27 1:32 UTC (permalink / raw)
To: virtio-dev, linux-kernel, virtualization, kvm, linux-mm, mst,
mhocko, akpm, dgilbert
Cc: yang.zhang.wz, riel, quan.xu0, liliang.opensource, pbonzini,
nilal, torvalds
In-Reply-To: <1535333539-32420-1-git-send-email-wei.w.wang@intel.com>
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.
Currenlty, only free page blocks of MAX_ORDER - 1 are reported. They are
obtained one by one from the mm free list via the regular allocation
function.
Host requests the guest to report free page hints by sending a new cmd id
to the guest via the free_page_report_cmd_id configuration register. When
the guest starts to report, it first sends a start cmd to host via the
free page vq, which acks to host the cmd id received. When the guest
finishes reporting free pages, a stop cmd is sent to host via the vq.
Host may also send a stop cmd id to the guest to stop the reporting.
VIRTIO_BALLOON_CMD_ID_STOP: Host sends this cmd to stop the guest
reporting.
VIRTIO_BALLOON_CMD_ID_DONE: Host sends this cmd to tell the guest that
the reported pages are ready to be freed.
Why does the guest free the reported pages when host tells it is ready to
free?
This is because freeing pages appears to be expensive for live migration.
free_pages() dirties memory very quickly and makes the live migraion not
converge in some cases. So it is good to delay the free_page operation
when the migration is done, and host sends a command to guest about that.
Why do we need the new VIRTIO_BALLOON_CMD_ID_DONE, instead of reusing
VIRTIO_BALLOON_CMD_ID_STOP?
This is because live migration is usually done in several rounds. At the
end of each round, host needs to send a VIRTIO_BALLOON_CMD_ID_STOP cmd to
the guest to stop (or say pause) the reporting. The guest resumes the
reporting when it receives a new command id at the beginning of the next
round. So we need a new cmd id to distinguish between "stop reporting" and
"ready to free the reported pages".
TODO:
- Add a batch page allocation API to amortize the allocation overhead.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
drivers/virtio/virtio_balloon.c | 364 ++++++++++++++++++++++++++++++++----
include/uapi/linux/virtio_balloon.h | 5 +
2 files changed, 336 insertions(+), 33 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index d1c1f62..a185678 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -41,13 +41,34 @@
#define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
+#define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
+ __GFP_NOMEMALLOC)
+/* The order of free page blocks to report to host */
+#define VIRTIO_BALLOON_FREE_PAGE_ORDER (MAX_ORDER - 1)
+/* The size of a free page block in bytes */
+#define VIRTIO_BALLOON_FREE_PAGE_SIZE \
+ (1 << (VIRTIO_BALLOON_FREE_PAGE_ORDER + PAGE_SHIFT))
+
#ifdef CONFIG_BALLOON_COMPACTION
static struct vfsmount *balloon_mnt;
#endif
+enum virtio_balloon_vq {
+ VIRTIO_BALLOON_VQ_INFLATE,
+ VIRTIO_BALLOON_VQ_DEFLATE,
+ VIRTIO_BALLOON_VQ_STATS,
+ VIRTIO_BALLOON_VQ_FREE_PAGE,
+ VIRTIO_BALLOON_VQ_MAX
+};
+
struct virtio_balloon {
struct virtio_device *vdev;
- struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+ struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+ /* Balloon's own wq for cpu-intensive work items */
+ struct workqueue_struct *balloon_wq;
+ /* The free page reporting work item submitted to the balloon wq */
+ struct work_struct report_free_page_work;
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -57,6 +78,18 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
+ /* The list of allocated free pages, waiting to be given back to mm */
+ struct list_head free_page_list;
+ spinlock_t free_page_list_lock;
+ /* The number of free page blocks on the above list */
+ unsigned long num_free_page_blocks;
+ /* The cmd id received from host */
+ u32 cmd_id_received;
+ /* The cmd id that is actively in use */
+ __virtio32 cmd_id_active;
+ /* Buffer to store the stop sign */
+ __virtio32 cmd_id_stop;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
@@ -320,17 +353,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
}
-static void virtballoon_changed(struct virtio_device *vdev)
-{
- struct virtio_balloon *vb = vdev->priv;
- unsigned long flags;
-
- spin_lock_irqsave(&vb->stop_update_lock, flags);
- if (!vb->stop_update)
- queue_work(system_freezable_wq, &vb->update_balloon_size_work);
- spin_unlock_irqrestore(&vb->stop_update_lock, flags);
-}
-
static inline s64 towards_target(struct virtio_balloon *vb)
{
s64 target;
@@ -347,6 +369,60 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
}
+/* Gives back @num_to_return blocks of free pages to mm. */
+static unsigned long return_free_pages_to_mm(struct virtio_balloon *vb,
+ unsigned long num_to_return)
+{
+ struct page *page;
+ unsigned long num_returned;
+
+ spin_lock_irq(&vb->free_page_list_lock);
+ for (num_returned = 0; num_returned < num_to_return; num_returned++) {
+ page = balloon_page_pop(&vb->free_page_list);
+ if (!page)
+ break;
+ free_pages((unsigned long)page_address(page),
+ VIRTIO_BALLOON_FREE_PAGE_ORDER);
+ }
+ vb->num_free_page_blocks -= num_returned;
+ spin_unlock_irq(&vb->free_page_list_lock);
+
+ return num_returned;
+}
+
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+ struct virtio_balloon *vb = vdev->priv;
+ unsigned long flags;
+ s64 diff = towards_target(vb);
+
+ if (diff) {
+ spin_lock_irqsave(&vb->stop_update_lock, flags);
+ if (!vb->stop_update)
+ queue_work(system_freezable_wq,
+ &vb->update_balloon_size_work);
+ spin_unlock_irqrestore(&vb->stop_update_lock, flags);
+ }
+
+ if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+ virtio_cread(vdev, struct virtio_balloon_config,
+ free_page_report_cmd_id, &vb->cmd_id_received);
+ if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
+ /* Pass ULONG_MAX to give back all the free pages */
+ return_free_pages_to_mm(vb, ULONG_MAX);
+ } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
+ vb->cmd_id_received !=
+ virtio32_to_cpu(vdev, vb->cmd_id_active)) {
+ spin_lock_irqsave(&vb->stop_update_lock, flags);
+ if (!vb->stop_update) {
+ queue_work(vb->balloon_wq,
+ &vb->report_free_page_work);
+ }
+ spin_unlock_irqrestore(&vb->stop_update_lock, flags);
+ }
+ }
+}
+
static void update_balloon_size(struct virtio_balloon *vb)
{
u32 actual = vb->num_pages;
@@ -389,26 +465,44 @@ static void update_balloon_size_func(struct work_struct *work)
static int init_vqs(struct virtio_balloon *vb)
{
- struct virtqueue *vqs[3];
- vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
- static const char * const names[] = { "inflate", "deflate", "stats" };
- int err, nvqs;
+ struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
+ vq_callback_t *callbacks[VIRTIO_BALLOON_VQ_MAX];
+ const char *names[VIRTIO_BALLOON_VQ_MAX];
+ int err;
/*
- * We expect two virtqueues: inflate and deflate, and
- * optionally stat.
+ * Inflateq and deflateq are used unconditionally. The names[]
+ * will be NULL if the related feature is not enabled, which will
+ * cause no allocation for the corresponding virtqueue in find_vqs.
*/
- nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
- err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL);
+ callbacks[VIRTIO_BALLOON_VQ_INFLATE] = balloon_ack;
+ names[VIRTIO_BALLOON_VQ_INFLATE] = "inflate";
+ callbacks[VIRTIO_BALLOON_VQ_DEFLATE] = balloon_ack;
+ names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate";
+ names[VIRTIO_BALLOON_VQ_STATS] = NULL;
+ names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
+
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
+ names[VIRTIO_BALLOON_VQ_STATS] = "stats";
+ callbacks[VIRTIO_BALLOON_VQ_STATS] = stats_request;
+ }
+
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+ names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "free_page_vq";
+ callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
+ }
+
+ err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX,
+ vqs, callbacks, names, NULL, NULL);
if (err)
return err;
- vb->inflate_vq = vqs[0];
- vb->deflate_vq = vqs[1];
+ vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
+ vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
struct scatterlist sg;
unsigned int num_stats;
- vb->stats_vq = vqs[2];
+ vb->stats_vq = vqs[VIRTIO_BALLOON_VQ_STATS];
/*
* Prime this virtqueue with one buffer so the hypervisor can
@@ -426,9 +520,145 @@ static int init_vqs(struct virtio_balloon *vb)
}
virtqueue_kick(vb->stats_vq);
}
+
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+ vb->free_page_vq = vqs[VIRTIO_BALLOON_VQ_FREE_PAGE];
+
+ return 0;
+}
+
+static int send_cmd_id_start(struct virtio_balloon *vb)
+{
+ struct scatterlist sg;
+ struct virtqueue *vq = vb->free_page_vq;
+ int err, unused;
+
+ /* Detach all the used buffers from the vq */
+ while (virtqueue_get_buf(vq, &unused))
+ ;
+
+ vb->cmd_id_active = cpu_to_virtio32(vb->vdev, vb->cmd_id_received);
+ sg_init_one(&sg, &vb->cmd_id_active, sizeof(vb->cmd_id_active));
+ err = virtqueue_add_outbuf(vq, &sg, 1, &vb->cmd_id_active, GFP_KERNEL);
+ if (!err)
+ virtqueue_kick(vq);
+ return err;
+}
+
+static int send_cmd_id_stop(struct virtio_balloon *vb)
+{
+ struct scatterlist sg;
+ struct virtqueue *vq = vb->free_page_vq;
+ int err, unused;
+
+ /* Detach all the used buffers from the vq */
+ while (virtqueue_get_buf(vq, &unused))
+ ;
+
+ sg_init_one(&sg, &vb->cmd_id_stop, sizeof(vb->cmd_id_stop));
+ err = virtqueue_add_outbuf(vq, &sg, 1, &vb->cmd_id_stop, GFP_KERNEL);
+ if (!err)
+ virtqueue_kick(vq);
+ return err;
+}
+
+static int get_free_page_and_send(struct virtio_balloon *vb)
+{
+ struct virtqueue *vq = vb->free_page_vq;
+ struct page *page;
+ struct scatterlist sg;
+ int err, unused;
+ void *p;
+
+ /* Detach all the used buffers from the vq */
+ while (virtqueue_get_buf(vq, &unused))
+ ;
+
+ page = alloc_pages(VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG,
+ VIRTIO_BALLOON_FREE_PAGE_ORDER);
+ /*
+ * When the allocation returns NULL, it indicates that we have got all
+ * the possible free pages, so return -EINTR to stop.
+ */
+ if (!page)
+ return -EINTR;
+
+ p = page_address(page);
+ sg_init_one(&sg, p, VIRTIO_BALLOON_FREE_PAGE_SIZE);
+ /* There is always 1 entry reserved for the cmd id to use. */
+ if (vq->num_free > 1) {
+ err = virtqueue_add_inbuf(vq, &sg, 1, p, GFP_KERNEL);
+ if (unlikely(err)) {
+ free_pages((unsigned long)p,
+ VIRTIO_BALLOON_FREE_PAGE_ORDER);
+ return err;
+ }
+ virtqueue_kick(vq);
+ spin_lock_irq(&vb->free_page_list_lock);
+ balloon_page_push(&vb->free_page_list, page);
+ vb->num_free_page_blocks++;
+ spin_unlock_irq(&vb->free_page_list_lock);
+ } else {
+ /*
+ * The vq has no available entry to add this page block, so
+ * just free it.
+ */
+ free_pages((unsigned long)p, VIRTIO_BALLOON_FREE_PAGE_ORDER);
+ }
+
return 0;
}
+static int send_free_pages(struct virtio_balloon *vb)
+{
+ int err;
+ u32 cmd_id_active;
+
+ while (1) {
+ /*
+ * If a stop id or a new cmd id was just received from host,
+ * stop the reporting.
+ */
+ cmd_id_active = virtio32_to_cpu(vb->vdev, vb->cmd_id_active);
+ if (cmd_id_active != vb->cmd_id_received)
+ break;
+
+ /*
+ * The free page blocks are allocated and sent to host one by
+ * one.
+ */
+ err = get_free_page_and_send(vb);
+ if (err == -EINTR)
+ break;
+ else if (unlikely(err))
+ return err;
+ }
+
+ return 0;
+}
+
+static void report_free_page_func(struct work_struct *work)
+{
+ int err;
+ struct virtio_balloon *vb = container_of(work, struct virtio_balloon,
+ report_free_page_work);
+ struct device *dev = &vb->vdev->dev;
+
+ /* Start by sending the received cmd id to host with an outbuf. */
+ err = send_cmd_id_start(vb);
+ if (unlikely(err))
+ dev_err(dev, "Failed to send a start id, err = %d\n", err);
+
+ err = send_free_pages(vb);
+ if (unlikely(err))
+ dev_err(dev, "Failed to send a free page, err = %d\n", err);
+
+ /* End by sending a stop id to host with an outbuf. */
+ err = send_cmd_id_stop(vb);
+ if (unlikely(err))
+ dev_err(dev, "Failed to send a stop id, err = %d\n", err);
+}
+
#ifdef CONFIG_BALLOON_COMPACTION
/*
* virtballoon_migratepage - perform the balloon page migration on behalf of
@@ -512,14 +742,23 @@ static struct file_system_type balloon_fs = {
#endif /* CONFIG_BALLOON_COMPACTION */
-static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
- struct shrink_control *sc)
+static unsigned long shrink_free_pages(struct virtio_balloon *vb,
+ unsigned long pages_to_free)
{
- unsigned long pages_to_free, pages_freed = 0;
- struct virtio_balloon *vb = container_of(shrinker,
- struct virtio_balloon, shrinker);
+ unsigned long blocks_to_free, blocks_freed;
- pages_to_free = sc->nr_to_scan * VIRTIO_BALLOON_PAGES_PER_PAGE;
+ pages_to_free = round_up(pages_to_free,
+ 1 << VIRTIO_BALLOON_FREE_PAGE_ORDER);
+ blocks_to_free = pages_to_free >> VIRTIO_BALLOON_FREE_PAGE_ORDER;
+ blocks_freed = return_free_pages_to_mm(vb, blocks_to_free);
+
+ return blocks_freed << VIRTIO_BALLOON_FREE_PAGE_ORDER;
+}
+
+static unsigned long shrink_balloon_pages(struct virtio_balloon *vb,
+ unsigned long pages_to_free)
+{
+ unsigned long pages_freed = 0;
/*
* One invocation of leak_balloon can deflate at most
@@ -527,12 +766,33 @@ static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
* multiple times to deflate pages till reaching pages_to_free.
*/
while (vb->num_pages && pages_to_free) {
+ pages_freed += leak_balloon(vb, pages_to_free) /
+ VIRTIO_BALLOON_PAGES_PER_PAGE;
pages_to_free -= pages_freed;
- pages_freed += leak_balloon(vb, pages_to_free);
}
update_balloon_size(vb);
- return pages_freed / VIRTIO_BALLOON_PAGES_PER_PAGE;
+ return pages_freed;
+}
+
+static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+ unsigned long pages_to_free, pages_freed = 0;
+ struct virtio_balloon *vb = container_of(shrinker,
+ struct virtio_balloon, shrinker);
+
+ pages_to_free = sc->nr_to_scan * VIRTIO_BALLOON_PAGES_PER_PAGE;
+
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+ pages_freed = shrink_free_pages(vb, pages_to_free);
+
+ if (pages_freed >= pages_to_free)
+ return pages_freed;
+
+ pages_freed += shrink_balloon_pages(vb, pages_to_free - pages_freed);
+
+ return pages_freed;
}
static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
@@ -540,8 +800,12 @@ static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
{
struct virtio_balloon *vb = container_of(shrinker,
struct virtio_balloon, shrinker);
+ unsigned long count;
- return vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE;
+ count = vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE;
+ count += vb->num_free_page_blocks >> VIRTIO_BALLOON_FREE_PAGE_ORDER;
+
+ return count;
}
static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
@@ -604,6 +868,31 @@ static int virtballoon_probe(struct virtio_device *vdev)
}
vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
#endif
+ if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+ /*
+ * There is always one entry reserved for cmd id, so the ring
+ * size needs to be at least two to report free page hints.
+ */
+ if (virtqueue_get_vring_size(vb->free_page_vq) < 2) {
+ err = -ENOSPC;
+ goto out_del_vqs;
+ }
+ vb->balloon_wq = alloc_workqueue("balloon-wq",
+ WQ_FREEZABLE | WQ_CPU_INTENSIVE, 0);
+ if (!vb->balloon_wq) {
+ err = -ENOMEM;
+ goto out_del_vqs;
+ }
+ INIT_WORK(&vb->report_free_page_work, report_free_page_func);
+ vb->cmd_id_received = VIRTIO_BALLOON_CMD_ID_STOP;
+ vb->cmd_id_active = cpu_to_virtio32(vb->vdev,
+ VIRTIO_BALLOON_CMD_ID_STOP);
+ vb->cmd_id_stop = cpu_to_virtio32(vb->vdev,
+ VIRTIO_BALLOON_CMD_ID_STOP);
+ vb->num_free_page_blocks = 0;
+ spin_lock_init(&vb->free_page_list_lock);
+ INIT_LIST_HEAD(&vb->free_page_list);
+ }
/*
* We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if a
* shrinker needs to be registered to relieve memory pressure.
@@ -611,7 +900,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
err = virtio_balloon_register_shrinker(vb);
if (err)
- goto out_del_vqs;
+ goto out_del_balloon_wq;
}
virtio_device_ready(vdev);
@@ -619,6 +908,9 @@ static int virtballoon_probe(struct virtio_device *vdev)
virtballoon_changed(vdev);
return 0;
+out_del_balloon_wq:
+ if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+ destroy_workqueue(vb->balloon_wq);
out_del_vqs:
vdev->config->del_vqs(vdev);
out_free_vb:
@@ -652,6 +944,11 @@ static void virtballoon_remove(struct virtio_device *vdev)
cancel_work_sync(&vb->update_balloon_size_work);
cancel_work_sync(&vb->update_balloon_stats_work);
+ if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+ cancel_work_sync(&vb->report_free_page_work);
+ destroy_workqueue(vb->balloon_wq);
+ }
+
remove_common(vb);
#ifdef CONFIG_BALLOON_COMPACTION
if (vb->vb_dev_info.inode)
@@ -703,6 +1000,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_MUST_TELL_HOST,
VIRTIO_BALLOON_F_STATS_VQ,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
+ VIRTIO_BALLOON_F_FREE_PAGE_HINT,
};
static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index 13b8cb5..47c9eb4 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -34,15 +34,20 @@
#define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
#define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */
#define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
/* Size of a PFN in the balloon interface. */
#define VIRTIO_BALLOON_PFN_SHIFT 12
+#define VIRTIO_BALLOON_CMD_ID_STOP 0
+#define VIRTIO_BALLOON_CMD_ID_DONE 1
struct virtio_balloon_config {
/* Number of pages host wants Guest to give up. */
__u32 num_pages;
/* Number of pages we've actually got in balloon. */
__u32 actual;
+ /* Free page report command id, readonly by guest */
+ __u32 free_page_report_cmd_id;
};
#define VIRTIO_BALLOON_S_SWAP_IN 0 /* Amount of memory swapped in */
--
2.7.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox