From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NCXJ9-0006cl-7p for qemu-devel@nongnu.org; Mon, 23 Nov 2009 06:42:15 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NCXIy-0006YC-V9 for qemu-devel@nongnu.org; Mon, 23 Nov 2009 06:42:13 -0500 Received: from [199.232.76.173] (port=53687 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NCXIy-0006Y9-KK for qemu-devel@nongnu.org; Mon, 23 Nov 2009 06:42:04 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53180) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NCX9S-0004nw-CM for qemu-devel@nongnu.org; Mon, 23 Nov 2009 06:32:14 -0500 Message-ID: <4B0A72AF.80702@redhat.com> Date: Mon, 23 Nov 2009 13:31:59 +0200 From: Vadim Rozenfeld MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: virtio: Add memory statistics reporting to the balloon driver (V3) References: <1258643169.3464.3.camel@aglitke> <1258643945.3464.5.camel@aglitke> <20091123094425.GB3748@redhat.com> <4B0A6B59.6030102@redhat.com> In-Reply-To: <4B0A6B59.6030102@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: dlaor@redhat.com Cc: Anthony Liguori , "Michael S. Tsirkin" , qemu-devel@nongnu.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, Avi Kivity , Adam Litke On 11/23/2009 01:00 PM, Dor Laor wrote: > On 11/23/2009 11:44 AM, Michael S. Tsirkin wrote: >> On Thu, Nov 19, 2009 at 09:19:05AM -0600, Adam Litke wrote: >>> Rusty and Anthony, >>> If I've addressed all outstanding issues, please consider this patch=20 >>> for >>> inclusion. Thanks. >>> >>> Changes since V2: >>> - Increase stat field size to 64 bits >>> - Report all sizes in kb (not pages) >>> - Drop anon_pages stat and fix endianness conversion >>> >>> Changes since V1: >>> - Use a virtqueue instead of the device config space >>> >>> When using ballooning to manage overcommitted memory on a host, a=20 >>> system for >>> guests to communicate their memory usage to the host can provide=20 >>> information >>> that will minimize the impact of ballooning on the guests. The=20 >>> current method >>> employs a daemon running in each guest that communicates memory=20 >>> statistics to a >>> host daemon at a specified time interval. The host daemon=20 >>> aggregates this >>> information and inflates and/or deflates balloons according to the=20 >>> level of >>> host memory pressure. This approach is effective but overly complex=20 >>> since a >>> daemon must be installed inside each guest and coordinated to=20 >>> communicate with >>> the host. A simpler approach is to collect memory statistics in the=20 >>> virtio >>> balloon driver and communicate them directly to the hypervisor. >>> >>> This patch enables the guest-side support by adding stats collection=20 >>> and >>> reporting to the virtio balloon driver. >>> >>> Signed-off-by: Adam Litke >>> Cc: Rusty Russell >>> Cc: Anthony Liguori >>> Cc: virtualization@lists.linux-foundation.org >>> Cc: linux-kernel@vger.kernel.org >> >> that's pretty clean. some comments: I wrote them while off-line, and n= ow >> that I was going to send it out, I see that there's some overlap with >> what Rusty wrote. Still, here it is: >> >>> diff --git a/drivers/virtio/virtio_balloon.c=20 >>> b/drivers/virtio/virtio_balloon.c >>> index 200c22f..ebc9d39 100644 >>> --- a/drivers/virtio/virtio_balloon.c >>> +++ b/drivers/virtio/virtio_balloon.c >>> @@ -29,7 +29,7 @@ >>> struct virtio_balloon >>> { >>> struct virtio_device *vdev; >>> - struct virtqueue *inflate_vq, *deflate_vq; >>> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; >>> >>> /* Where the ballooning thread waits for config to change. */ >>> wait_queue_head_t config_change; >>> @@ -50,6 +50,9 @@ struct virtio_balloon >>> /* The array of pfns we tell the Host about. */ >>> unsigned int num_pfns; >>> u32 pfns[256]; >>> + >>> + /* Memory statistics */ >>> + struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; >>> }; >>> >>> static struct virtio_device_id id_table[] =3D { >>> @@ -155,6 +158,57 @@ static void leak_balloon(struct virtio_balloon=20 >>> *vb, size_t num) >>> } >>> } >>> >>> +static inline void update_stat(struct virtio_balloon *vb, int idx, >>> + __le16 tag, __le64 val) >>> +{ >>> + BUG_ON(idx>=3D VIRTIO_BALLOON_S_NR); >>> + vb->stats[idx].tag =3D cpu_to_le16(tag); >>> + vb->stats[idx].val =3D cpu_to_le64(val); >> >> you should do le16_to_cpu etc on __le values. >> Try running this patch through sparce checker. >> >>> +} >>> + >>> +#define K(x) ((x)<< (PAGE_SHIFT - 10)) >> >> can't this overflow? >> also, won't it be simpler to just report memory is >> in bytes, then just x * PAGE_SIZE instead of hairy constants? >> >>> +static void update_balloon_stats(struct virtio_balloon *vb) >>> +{ >>> + unsigned long events[NR_VM_EVENT_ITEMS]; >> >> that's about 1/2K worth of stack space on a 64 bit machine. >> better keep it as part of struct virtio_balloon. >> >>> + struct sysinfo i; >>> + int idx =3D 0; >>> + >>> + all_vm_events(events); >>> + si_meminfo(&i); >>> + >>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_SWAP_IN,=20 >>> K(events[PSWPIN])); >>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_SWAP_OUT,=20 >>> K(events[PSWPOUT])); >>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT,=20 >>> events[PGMAJFAULT]); >>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT])= ; >>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE, K(i.freeram)); >>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMTOT, K(i.totalram)); > > > Finally I found some data from our M$ neighbors. This is from=20 > http://www.microsoft.com/whdc/system/sysperf/Perf_tun_srv-R2.mspx > > "When running Windows in the child partition, you can use the=20 > following performance counters within a child partition to identify=20 > whether the child partition is experiencing memory pressure and is=20 > likely to perform better with a higher VM memory size: > > Memory =96 Standby Cache Reserve Bytes: > Sum of Standby Cache Reserve Bytes and Free and Zero Page List Bytes=20 > should be 200 MB or more on systems with 1 GB, and 300 MB or more on=20 > systems with 2 GB or more of visible RAM. > > Memory =96 Free & Zero Page List Bytes: > Sum of Standby Cache Reserve Bytes and Free and Zero Page List Bytes=20 > should be 200 MB or more on systems with 1 GB, and 300 MB or more on=20 > systems with 2 GB or more of visible RAM. > > Memory =96 Pages Input/Sec: > Average over a 1-hour period is less than 10. > " > > The biggest take out of it is that we may get #0-pages in windows. > It's valuable information for the host since KSM will collapse all of=20 > them anyway, so there is not point in ballooning them. > Vadim, can you check if we can extract it from windows? We can do it from kernel mode by calling ZwQuerySystemInformation=20 function. But unfortunately SystemPerformanceInformation class is not=20 documented anywhere, except for Gary Nebbet book. It could be easier to query performance counters from user mode (say,=20 some sort of very light service) and pass the information down to=20 balloon driver. Using WMI could be another option, but I need to check whether we can=20 read performance counters from WMI provider at kernel level. > >>> +} >>> + >>> +/* >>> + * While most virtqueues communicate guest-initiated requests to=20 >>> the hypervisor, >>> + * the stats queue operates in reverse. The driver initializes the=20 >>> virtqueue >>> + * with a single buffer. From that point forward, all=20 >>> conversations consist of >>> + * a hypervisor request (a call to this function) which directs us=20 >>> to refill >>> + * the virtqueue with a fresh stats buffer. >>> + */ >>> +static void stats_ack(struct virtqueue *vq) >>> +{ >>> + struct virtio_balloon *vb; >>> + unsigned int len; >>> + struct scatterlist sg; >>> + >>> + vb =3D vq->vq_ops->get_buf(vq,&len); >>> + if (!vb) >>> + return; >>> + >>> + update_balloon_stats(vb); >>> + >>> + sg_init_one(&sg, vb->stats, sizeof(vb->stats)); >>> + if (vq->vq_ops->add_buf(vq,&sg, 1, 0, vb)< 0) >>> + BUG(); >>> + vq->vq_ops->kick(vq); >>> +} >>> + >>> static void virtballoon_changed(struct virtio_device *vdev) >>> { >>> struct virtio_balloon *vb =3D vdev->priv; >>> @@ -205,10 +259,10 @@ static int balloon(void *_vballoon) >>> static int virtballoon_probe(struct virtio_device *vdev) >>> { >>> struct virtio_balloon *vb; >>> - struct virtqueue *vqs[2]; >>> - vq_callback_t *callbacks[] =3D { balloon_ack, balloon_ack }; >>> - const char *names[] =3D { "inflate", "deflate" }; >>> - int err; >>> + struct virtqueue *vqs[3]; >>> + vq_callback_t *callbacks[] =3D { balloon_ack, balloon_ack,=20 >>> stats_ack }; >>> + const char *names[] =3D { "inflate", "deflate", "stats" }; >>> + int err, nvqs; >>> >>> vdev->priv =3D vb =3D kmalloc(sizeof(*vb), GFP_KERNEL); >>> if (!vb) { >>> @@ -221,13 +275,28 @@ static int virtballoon_probe(struct=20 >>> virtio_device *vdev) >>> init_waitqueue_head(&vb->config_change); >>> vb->vdev =3D vdev; >>> >>> - /* We expect two virtqueues. */ >>> - err =3D vdev->config->find_vqs(vdev, 2, vqs, callbacks, names); >>> + /* We expect two virtqueues: inflate and deflate, >>> + * and optionally stat. */ >>> + nvqs =3D virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)= =20 >>> ? 3 : 2; >>> + err =3D vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names= ); >>> if (err) >>> goto out_free_vb; >>> >>> vb->inflate_vq =3D vqs[0]; >>> vb->deflate_vq =3D vqs[1]; >>> + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { >>> + struct scatterlist sg; >>> + vb->stats_vq =3D vqs[2]; >>> + >>> + /* >>> + * Prime this virtqueue with one buffer so the hypervisor ca= n >>> + * use it to signal us later. >>> + */ >>> + sg_init_one(&sg, vb->stats, sizeof(vb->stats)); >> >> should be "sizeof vb->stats" >> >>> + if (vb->stats_vq->vq_ops->add_buf(vb->stats_vq,&sg, 1, 0,=20 >>> vb)< 0) >>> + BUG(); >>> + vb->stats_vq->vq_ops->kick(vb->stats_vq); >>> + } >>> >>> vb->thread =3D kthread_run(balloon, vb, "vballoon"); >>> if (IS_ERR(vb->thread)) { >>> @@ -265,7 +334,9 @@ static void virtballoon_remove(struct=20 >>> virtio_device *vdev) >>> kfree(vb); >>> } >>> >>> -static unsigned int features[] =3D { VIRTIO_BALLOON_F_MUST_TELL_HOST= }; >>> +static unsigned int features[] =3D { >>> + VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, >> >> one per line >> >>> +}; >>> >>> static struct virtio_driver virtio_balloon =3D { >>> .feature_table =3D features, >>> diff --git a/include/linux/virtio_balloon.h=20 >>> b/include/linux/virtio_balloon.h >>> index 09d7300..a5c09e6 100644 >>> --- a/include/linux/virtio_balloon.h >>> +++ b/include/linux/virtio_balloon.h >>> @@ -6,6 +6,7 @@ >>> >>> /* The feature bitmap for virtio balloon */ >>> #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before=20 >>> reclaiming pages */ >>> +#define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ >> >> align with a tab >>> >>> /* Size of a PFN in the balloon interface. */ >>> #define VIRTIO_BALLOON_PFN_SHIFT 12 >>> @@ -17,4 +18,19 @@ struct virtio_balloon_config >>> /* Number of pages we've actually got in balloon. */ >>> __le32 actual; >>> }; >>> + >>> +#define VIRTIO_BALLOON_S_SWAP_IN 0 /* Amount of memory swapped=20 >>> in */ >>> +#define VIRTIO_BALLOON_S_SWAP_OUT 1 /* Amount of memory swapped=20 >>> out */ >>> +#define VIRTIO_BALLOON_S_MAJFLT 2 /* Number of major faults */ >>> +#define VIRTIO_BALLOON_S_MINFLT 3 /* Number of minor faults */ >>> +#define VIRTIO_BALLOON_S_MEMFREE 4 /* Total amount of free=20 >>> memory */ >>> +#define VIRTIO_BALLOON_S_MEMTOT 5 /* Total amount of memory */ >> >> this seems to imply the total amount of memory in bytes? >> >>> +#define VIRTIO_BALLOON_S_NR 6 >>> + >>> +struct virtio_balloon_stat >>> +{ >>> + __le16 tag; >>> + __le64 val; >> >> This might create padding issues e.g. when guest >> is 32 bit and host is 64 bit. Better add padding >> manually, or pack the structure (but packed structures >> often cause gcc to generate terrible code). >> >> >>> +}; >>> + >>> #endif /* _LINUX_VIRTIO_BALLOON_H */ >>> >>> >>> --=20 >>> Thanks, >>> Adam >>> >>> _______________________________________________ >>> Virtualization mailing list >>> Virtualization@lists.linux-foundation.org >>> https://lists.linux-foundation.org/mailman/listinfo/virtualization >> _______________________________________________ >> Virtualization mailing list >> Virtualization@lists.linux-foundation.org >> https://lists.linux-foundation.org/mailman/listinfo/virtualization > > >