* Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Andi Kleen @ 2012-06-26 16:52 UTC (permalink / raw)
To: Mel Gorman
Cc: Rik van Riel, Rafael Aquini, Michael S. Tsirkin, linux-kernel,
virtualization, linux-mm, Andi Kleen
In-Reply-To: <20120626101729.GF8103@csn.ul.ie>
>
> What shocked me actually is that VM_BUG_ON code is executed on
> !CONFIG_DEBUG_VM builds and has been since 2.6.36 due to commit [4e60c86bd:
> gcc-4.6: mm: fix unused but set warnings]. I thought the whole point of
> VM_BUG_ON was to avoid expensive and usually unnecessary checks. Andi,
> was this deliberate?
The idea was that the compiler optimizes it away anyways.
I'm not fully sure what putback_balloon_page does, but if it just tests
a bit (without non variable test_bit) it should be ok.
-Andi
^ permalink raw reply
* Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Andi Kleen @ 2012-06-26 16:54 UTC (permalink / raw)
To: Andi Kleen
Cc: Rik van Riel, Rafael Aquini, Michael S. Tsirkin, linux-kernel,
virtualization, linux-mm
In-Reply-To: <20120626165258.GY11413@one.firstfloor.org>
> I'm not fully sure what putback_balloon_page does, but if it just tests
> a bit (without non variable test_bit) it should be ok.
^^^^^^^^^^^^^
should be "without variable ..."
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply
* Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Mel Gorman @ 2012-06-26 20:15 UTC (permalink / raw)
To: Andi Kleen
Cc: Rik van Riel, Rafael Aquini, Michael S. Tsirkin, linux-kernel,
virtualization, linux-mm
In-Reply-To: <20120626165258.GY11413@one.firstfloor.org>
On Tue, Jun 26, 2012 at 06:52:58PM +0200, Andi Kleen wrote:
> >
> > What shocked me actually is that VM_BUG_ON code is executed on
> > !CONFIG_DEBUG_VM builds and has been since 2.6.36 due to commit [4e60c86bd:
> > gcc-4.6: mm: fix unused but set warnings]. I thought the whole point of
> > VM_BUG_ON was to avoid expensive and usually unnecessary checks. Andi,
> > was this deliberate?
>
> The idea was that the compiler optimizes it away anyways.
>
> I'm not fully sure what putback_balloon_page does, but if it just tests
> a bit (without non variable test_bit) it should be ok.
>
This was the definition before
#ifdef CONFIG_DEBUG_VM
#define VM_BUG_ON(cond) BUG_ON(cond)
#else
#define VM_BUG_ON(cond) do { } while (0)
#endif
and now it's
#ifdef CONFIG_DEBUG_VM
#define VM_BUG_ON(cond) BUG_ON(cond)
#else
#define VM_BUG_ON(cond) do { (void)(cond); } while (0)
#endif
How is the compiler meant to optimise away "cond" if it's a function
call?
In the old definition VM_BUG_ON did nothing and the intention was that the
"cond" should never had any side-effects. It was to be used for potentially
expensive tests to catch additional issues in DEBUG_VM kernels. My concern
is that after commit 4e60c86bd that the VM doing these additional checks
unnecesarily with a performance hit. In most cases the checks are small
but in others such as free_pages we are calling virt_addr_valid() which
is heavier.
What did I miss? If nothing, then I will revert this particular change
and Rafael will need to be sure his patch is not using VM_BUG_ON to call
a function with side-effects.
--
Mel Gorman
SUSE Labs
^ permalink raw reply
* [PATCH] Add a page cache-backed balloon device driver.
From: Frank Swiderski @ 2012-06-26 20:32 UTC (permalink / raw)
To: Rusty Russell, Michael S. Tsirkin, riel, Andrea Arcangeli
Cc: Frank Swiderski, mikew, linux-kernel, kvm, virtualization
This implementation of a virtio balloon driver uses the page cache to
"store" pages that have been released to the host. The communication
(outside of target counts) is one way--the guest notifies the host when
it adds a page to the page cache, allowing the host to madvise(2) with
MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit
(via the regular page reclaim). This means that inflating the balloon
is similar to the existing balloon mechanism, but the deflate is
different--it re-uses existing Linux kernel functionality to
automatically reclaim.
Signed-off-by: Frank Swiderski <fes@google.com>
---
drivers/virtio/Kconfig | 13 +
drivers/virtio/Makefile | 1 +
drivers/virtio/virtio_fileballoon.c | 636 +++++++++++++++++++++++++++++++++++
include/linux/virtio_balloon.h | 9 +
include/linux/virtio_ids.h | 1 +
5 files changed, 660 insertions(+), 0 deletions(-)
create mode 100644 drivers/virtio/virtio_fileballoon.c
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index f38b17a..cffa2a7 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -35,6 +35,19 @@ config VIRTIO_BALLOON
If unsure, say M.
+config VIRTIO_FILEBALLOON
+ tristate "Virtio page cache-backed balloon driver"
+ select VIRTIO
+ select VIRTIO_RING
+ ---help---
+ This driver supports decreasing and automatically reclaiming the
+ memory within a guest VM. Unlike VIRTIO_BALLOON, this driver instead
+ tries to maintain a specific target balloon size using the page cache.
+ This allows the guest to implicitly deflate the balloon by flushing
+ pages from the cache and touching the page.
+
+ If unsure, say N.
+
config VIRTIO_MMIO
tristate "Platform bus driver for memory mapped virtio devices (EXPERIMENTAL)"
depends on HAS_IOMEM && EXPERIMENTAL
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 5a4c63c..7ca0a3f 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VIRTIO_FILEBALLOON) += virtio_fileballoon.o
diff --git a/drivers/virtio/virtio_fileballoon.c b/drivers/virtio/virtio_fileballoon.c
new file mode 100644
index 0000000..ff252ec
--- /dev/null
+++ b/drivers/virtio/virtio_fileballoon.c
@@ -0,0 +1,636 @@
+/* Virtio file (page cache-backed) balloon implementation, inspired by
+ * Dor Loar and Marcelo Tosatti's implementations, and based on Rusty Russel's
+ * implementation.
+ *
+ * This implementation of the virtio balloon driver re-uses the page cache to
+ * allow memory consumed by inflating the balloon to be reclaimed by linux. It
+ * creates and mounts a bare-bones filesystem containing a single inode. When
+ * the host requests the balloon to inflate, it does so by "reading" pages at
+ * offsets into the inode mapping's page_tree. The host is notified when the
+ * pages are added to the page_tree, allowing it (the host) to madvise(2) the
+ * corresponding host memory, reducing the RSS of the virtual machine. In this
+ * implementation, the host is only notified when a page is added to the
+ * balloon. Reclaim happens under the existing TTFP logic, which flushes unused
+ * pages in the page cache. If the host used MADV_DONTNEED, then when the guest
+ * uses the page, the zero page will be mapped in, allowing automatic (and fast,
+ * compared to requiring a host notification via a virtio queue to get memory
+ * back) reclaim.
+ *
+ * Copyright 2008 Rusty Russell IBM Corporation
+ * Copyright 2011 Frank Swiderski Google Inc
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include <linux/backing-dev.h>
+#include <linux/delay.h>
+#include <linux/file.h>
+#include <linux/freezer.h>
+#include <linux/fs.h>
+#include <linux/jiffies.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/virtio.h>
+#include <linux/virtio_balloon.h>
+#include <linux/writeback.h>
+
+#define VIRTBALLOON_PFN_ARRAY_SIZE 256
+
+struct virtio_balloon {
+ struct virtio_device *vdev;
+ struct virtqueue *inflate_vq;
+
+ /* Where the ballooning thread waits for config to change. */
+ wait_queue_head_t config_change;
+
+ /* The thread servicing the balloon. */
+ struct task_struct *thread;
+
+ /* Waiting for host to ack the pages we released. */
+ struct completion acked;
+
+ /* The array of pfns we tell the Host about. */
+ unsigned int num_pfns;
+ u32 pfns[VIRTBALLOON_PFN_ARRAY_SIZE];
+
+ struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
+
+ /* The last page offset read into the mapping's page_tree */
+ unsigned long last_scan_page_array;
+
+ /* The last time a page was reclaimed */
+ unsigned long last_reclaim;
+};
+
+/* Magic number used for the skeleton filesystem in the call to mount_pseudo */
+#define BALLOONFS_MAGIC 0x42414c4c
+
+static struct virtio_device_id id_table[] = {
+ { VIRTIO_ID_FILE_BALLOON, VIRTIO_DEV_ANY_ID },
+ { 0 },
+};
+
+/*
+ * The skeleton filesystem contains a single inode, held by the structure below.
+ * Using the containing structure below allows easy access to the struct
+ * virtio_balloon.
+ */
+static struct balloon_inode {
+ struct inode inode;
+ struct virtio_balloon *vb;
+} the_inode;
+
+/*
+ * balloon_alloc_inode is called when the single inode for the skeleton
+ * filesystem is created in init() with the call to new_inode.
+ */
+static struct inode *balloon_alloc_inode(struct super_block *sb)
+{
+ static bool already_inited;
+ /* We should only ever be called once! */
+ BUG_ON(already_inited);
+ already_inited = true;
+ inode_init_once(&the_inode.inode);
+ return &the_inode.inode;
+}
+
+/* Noop implementation of destroy_inode. */
+static void balloon_destroy_inode(struct inode *inode)
+{
+}
+
+static int balloon_sync_fs(struct super_block *sb, int wait)
+{
+ return filemap_write_and_wait(the_inode.inode.i_mapping);
+}
+
+static const struct super_operations balloonfs_ops = {
+ .alloc_inode = balloon_alloc_inode,
+ .destroy_inode = balloon_destroy_inode,
+ .sync_fs = balloon_sync_fs,
+};
+
+static const struct dentry_operations balloonfs_dentry_operations = {
+};
+
+/*
+ * balloonfs_writepage is called when linux needs to reclaim memory held using
+ * the balloonfs' page cache.
+ */
+static int balloonfs_writepage(struct page *page, struct writeback_control *wbc)
+{
+ the_inode.vb->last_reclaim = jiffies;
+ SetPageUptodate(page);
+ ClearPageDirty(page);
+ /*
+ * If the page isn't being flushed from the page allocator, go ahead and
+ * drop it from the page cache anyway.
+ */
+ if (!wbc->for_reclaim)
+ delete_from_page_cache(page);
+ unlock_page(page);
+ return 0;
+}
+
+/* Nearly no-op implementation of readpage */
+static int balloonfs_readpage(struct file *file, struct page *page)
+{
+ SetPageUptodate(page);
+ unlock_page(page);
+ return 0;
+}
+
+static const struct address_space_operations balloonfs_aops = {
+ .writepage = balloonfs_writepage,
+ .readpage = balloonfs_readpage
+};
+
+static struct backing_dev_info balloonfs_backing_dev_info = {
+ .name = "balloonfs",
+ .ra_pages = 0,
+ .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK
+};
+
+static struct dentry *balloonfs_mount(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *data)
+{
+ struct dentry *root;
+ struct inode *inode;
+ root = mount_pseudo(fs_type, "balloon:", &balloonfs_ops,
+ &balloonfs_dentry_operations, BALLOONFS_MAGIC);
+ inode = root->d_inode;
+ inode->i_mapping->a_ops = &balloonfs_aops;
+ mapping_set_gfp_mask(inode->i_mapping,
+ (GFP_HIGHUSER | __GFP_NOMEMALLOC));
+ inode->i_mapping->backing_dev_info = &balloonfs_backing_dev_info;
+ return root;
+}
+
+/* The single mounted skeleton filesystem */
+static struct vfsmount *balloon_mnt __read_mostly;
+
+static struct file_system_type balloon_fs_type = {
+ .name = "balloonfs",
+ .mount = balloonfs_mount,
+ .kill_sb = kill_anon_super,
+};
+
+/* Acknowledges a message from the specified virtqueue. */
+static void balloon_ack(struct virtqueue *vq)
+{
+ struct virtio_balloon *vb;
+ unsigned int len;
+
+ vb = virtqueue_get_buf(vq, &len);
+ if (vb)
+ complete(&vb->acked);
+}
+
+/*
+ * Scans the page_tree for the inode's mapping, looking for an offset that is
+ * currently empty, returning that index (or 0 if it could not fill the
+ * request).
+ */
+static unsigned long find_available_inode_page(struct virtio_balloon *vb)
+{
+ unsigned long radix_index, index, max_scan;
+ struct address_space *mapping = the_inode.inode.i_mapping;
+
+ /*
+ * This function is a serialized call (only happens on the free-to-host
+ * thread), so no locking is necessary here.
+ */
+ index = vb->last_scan_page_array;
+ max_scan = totalram_pages - vb->last_scan_page_array;
+
+ /*
+ * Scan starting at the last scanned offset, then wrap around if
+ * necessary.
+ */
+ if (index == 0)
+ index = 1;
+ rcu_read_lock();
+ radix_index = radix_tree_next_hole(&mapping->page_tree,
+ index, max_scan);
+ rcu_read_unlock();
+ /*
+ * If we hit the end of the tree, wrap and search up to the original
+ * index.
+ */
+ if (radix_index - index >= max_scan) {
+ if (index != 1) {
+ rcu_read_lock();
+ radix_index = radix_tree_next_hole(&mapping->page_tree,
+ 1, index);
+ rcu_read_unlock();
+ if (radix_index - 1 >= index)
+ radix_index = 0;
+ } else {
+ radix_index = 0;
+ }
+ }
+ vb->last_scan_page_array = radix_index;
+
+ return radix_index;
+}
+
+/* Notifies the host of pages in the specified virtqueue. */
+static int tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
+{
+ int err;
+ struct scatterlist sg;
+
+ sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+
+ init_completion(&vb->acked);
+
+ /* We should always be able to add one buffer to an empty queue. */
+ err = virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL);
+ if (err < 0)
+ return err;
+ virtqueue_kick(vq);
+
+ /* When host has read buffer, this completes via balloon_ack */
+ wait_for_completion(&vb->acked);
+ return err;
+}
+
+static void fill_balloon(struct virtio_balloon *vb, size_t num)
+{
+ int err;
+
+ /* We can only do one array worth at a time. */
+ num = min(num, ARRAY_SIZE(vb->pfns));
+
+ for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns++) {
+ struct page *page;
+ unsigned long inode_pfn = find_available_inode_page(vb);
+ /* Should always be able to find a page. */
+ BUG_ON(!inode_pfn);
+ page = read_mapping_page(the_inode.inode.i_mapping, inode_pfn,
+ NULL);
+ if (IS_ERR(page)) {
+ if (printk_ratelimit())
+ dev_printk(KERN_INFO, &vb->vdev->dev,
+ "Out of puff! Can't get %zu pages\n",
+ num);
+ break;
+ }
+
+ /* Set the page to be dirty */
+ set_page_dirty(page);
+
+ vb->pfns[vb->num_pfns] = page_to_pfn(page);
+ }
+
+ /* Didn't get any? Oh well. */
+ if (vb->num_pfns == 0)
+ return;
+
+ /* Notify the host of the pages we just added to the page_tree. */
+ err = tell_host(vb, vb->inflate_vq);
+
+ for (; vb->num_pfns != 0; vb->num_pfns--) {
+ struct page *page = pfn_to_page(vb->pfns[vb->num_pfns - 1]);
+ /*
+ * Release our refcount on the page so that it can be reclaimed
+ * when necessary.
+ */
+ page_cache_release(page);
+ }
+ __mark_inode_dirty(&the_inode.inode, I_DIRTY_PAGES);
+}
+
+static inline void update_stat(struct virtio_balloon *vb, int idx,
+ u64 val)
+{
+ BUG_ON(idx >= VIRTIO_BALLOON_S_NR);
+ vb->stats[idx].tag = idx;
+ vb->stats[idx].val = val;
+}
+
+#define pages_to_bytes(x) ((u64)(x) << PAGE_SHIFT)
+
+static inline u32 config_pages(struct virtio_balloon *vb);
+static void update_balloon_stats(struct virtio_balloon *vb)
+{
+ unsigned long events[NR_VM_EVENT_ITEMS];
+ struct sysinfo i;
+
+ all_vm_events(events);
+ si_meminfo(&i);
+
+ update_stat(vb, VIRTIO_BALLOON_S_SWAP_IN,
+ pages_to_bytes(events[PSWPIN]));
+ update_stat(vb, VIRTIO_BALLOON_S_SWAP_OUT,
+ pages_to_bytes(events[PSWPOUT]));
+ update_stat(vb, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
+ update_stat(vb, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
+
+ /* Total and Free Mem */
+ update_stat(vb, VIRTIO_BALLOON_S_MEMFREE, pages_to_bytes(i.freeram));
+ update_stat(vb, VIRTIO_BALLOON_S_MEMTOT, pages_to_bytes(i.totalram));
+}
+
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+ struct virtio_balloon *vb = vdev->priv;
+
+ wake_up(&vb->config_change);
+}
+
+static inline bool config_need_stats(struct virtio_balloon *vb)
+{
+ u32 v = 0;
+
+ vb->vdev->config->get(vb->vdev,
+ offsetof(struct virtio_balloon_config,
+ need_stats),
+ &v, sizeof(v));
+ return (v != 0);
+}
+
+static inline u32 config_pages(struct virtio_balloon *vb)
+{
+ u32 v = 0;
+
+ vb->vdev->config->get(vb->vdev,
+ offsetof(struct virtio_balloon_config, num_pages),
+ &v, sizeof(v));
+ return v;
+}
+
+static inline s64 towards_target(struct virtio_balloon *vb)
+{
+ struct address_space *mapping = the_inode.inode.i_mapping;
+ u32 v = config_pages(vb);
+
+ return (s64)v - (mapping ? mapping->nrpages : 0);
+}
+
+static void update_balloon_size(struct virtio_balloon *vb)
+{
+ struct address_space *mapping = the_inode.inode.i_mapping;
+ __le32 actual = cpu_to_le32((mapping ? mapping->nrpages : 0));
+
+ vb->vdev->config->set(vb->vdev,
+ offsetof(struct virtio_balloon_config, actual),
+ &actual, sizeof(actual));
+}
+
+static void update_free_and_total(struct virtio_balloon *vb)
+{
+ struct sysinfo i;
+ u32 value;
+
+ si_meminfo(&i);
+
+ update_balloon_stats(vb);
+ value = i.totalram;
+ vb->vdev->config->set(vb->vdev,
+ offsetof(struct virtio_balloon_config,
+ pages_total),
+ &value, sizeof(value));
+ value = i.freeram;
+ vb->vdev->config->set(vb->vdev,
+ offsetof(struct virtio_balloon_config,
+ pages_free),
+ &value, sizeof(value));
+ value = 0;
+ vb->vdev->config->set(vb->vdev,
+ offsetof(struct virtio_balloon_config,
+ need_stats),
+ &value, sizeof(value));
+}
+
+static int balloon(void *_vballoon)
+{
+ struct virtio_balloon *vb = _vballoon;
+
+ set_freezable();
+ while (!kthread_should_stop()) {
+ s64 diff;
+ try_to_freeze();
+ wait_event_interruptible(vb->config_change,
+ (diff = towards_target(vb)) > 0
+ || config_need_stats(vb)
+ || kthread_should_stop()
+ || freezing(current));
+ if (config_need_stats(vb))
+ update_free_and_total(vb);
+ if (diff > 0) {
+ unsigned long reclaim_time = vb->last_reclaim + 2 * HZ;
+ /*
+ * Don't fill the balloon if a page reclaim happened in
+ * the past 2 seconds.
+ */
+ if (time_after_eq(reclaim_time, jiffies)) {
+ /* Inflating too fast--sleep and skip. */
+ msleep(500);
+ } else {
+ fill_balloon(vb, diff);
+ }
+ } else if (diff < 0 && config_pages(vb) == 0) {
+ /*
+ * Here we are specifically looking to detect the case
+ * where there are pages in the page cache, but the
+ * device wants us to go to 0. This is used in save/
+ * restore since the host device doesn't keep track of
+ * PFNs, and must flush the page cache on restore
+ * (which loses the context of the original device
+ * instance). However, we still suggest syncing the
+ * diff so that we can get within the target range.
+ */
+ s64 nr_to_write =
+ (!config_pages(vb) ? LONG_MAX : -diff);
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_ALL,
+ .nr_to_write = nr_to_write,
+ .range_start = 0,
+ .range_end = LLONG_MAX,
+ };
+ sync_inode(&the_inode.inode, &wbc);
+ }
+ update_balloon_size(vb);
+ }
+ return 0;
+}
+
+static ssize_t virtballoon_attr_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf);
+
+static DEVICE_ATTR(total_memory, 0644,
+ virtballoon_attr_show, NULL);
+
+static DEVICE_ATTR(free_memory, 0644,
+ virtballoon_attr_show, NULL);
+
+static DEVICE_ATTR(target_pages, 0644,
+ virtballoon_attr_show, NULL);
+
+static DEVICE_ATTR(actual_pages, 0644,
+ virtballoon_attr_show, NULL);
+
+static struct attribute *virtballoon_attrs[] = {
+ &dev_attr_total_memory.attr,
+ &dev_attr_free_memory.attr,
+ &dev_attr_target_pages.attr,
+ &dev_attr_actual_pages.attr,
+ NULL
+};
+static struct attribute_group virtballoon_attr_group = {
+ .name = "virtballoon",
+ .attrs = virtballoon_attrs,
+};
+
+static ssize_t virtballoon_attr_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct address_space *mapping = the_inode.inode.i_mapping;
+ struct virtio_device *vdev = container_of(dev, struct virtio_device,
+ dev);
+ struct virtio_balloon *vb = vdev->priv;
+ unsigned long long value = 0;
+ if (attr == &dev_attr_total_memory)
+ value = vb->stats[VIRTIO_BALLOON_S_MEMTOT].val;
+ else if (attr == &dev_attr_free_memory)
+ value = vb->stats[VIRTIO_BALLOON_S_MEMFREE].val;
+ else if (attr == &dev_attr_target_pages)
+ value = config_pages(vb);
+ else if (attr == &dev_attr_actual_pages)
+ value = cpu_to_le32((mapping ? mapping->nrpages : 0));
+ return sprintf(buf, "%llu\n", value);
+}
+
+static int virtballoon_probe(struct virtio_device *vdev)
+{
+ struct virtio_balloon *vb;
+ struct virtqueue *vq[1];
+ vq_callback_t *callback = balloon_ack;
+ const char *name = "inflate";
+ int err;
+
+ vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL);
+ if (!vb) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ init_waitqueue_head(&vb->config_change);
+ vb->vdev = vdev;
+
+ /* We use one virtqueue: inflate */
+ err = vdev->config->find_vqs(vdev, 1, vq, &callback, &name);
+ if (err)
+ goto out_free_vb;
+
+ vb->inflate_vq = vq[0];
+
+ err = sysfs_create_group(&vdev->dev.kobj, &virtballoon_attr_group);
+ if (err) {
+ pr_err("Failed to create virtballoon sysfs node\n");
+ goto out_free_vb;
+ }
+
+ vb->last_scan_page_array = 0;
+ vb->last_reclaim = 0;
+ the_inode.vb = vb;
+
+ vb->thread = kthread_run(balloon, vb, "vballoon");
+ if (IS_ERR(vb->thread)) {
+ err = PTR_ERR(vb->thread);
+ goto out_del_vqs;
+ }
+
+ return 0;
+
+out_del_vqs:
+ vdev->config->del_vqs(vdev);
+out_free_vb:
+ kfree(vb);
+out:
+ return err;
+}
+
+static void __devexit virtballoon_remove(struct virtio_device *vdev)
+{
+ struct virtio_balloon *vb = vdev->priv;
+
+ kthread_stop(vb->thread);
+
+ sysfs_remove_group(&vdev->dev.kobj, &virtballoon_attr_group);
+
+ /* Now we reset the device so we can clean up the queues. */
+ vdev->config->reset(vdev);
+
+ vdev->config->del_vqs(vdev);
+ kfree(vb);
+}
+
+static struct virtio_driver virtio_balloon_driver = {
+ .feature_table = NULL,
+ .feature_table_size = 0,
+ .driver.name = KBUILD_MODNAME,
+ .driver.owner = THIS_MODULE,
+ .id_table = id_table,
+ .probe = virtballoon_probe,
+ .remove = __devexit_p(virtballoon_remove),
+ .config_changed = virtballoon_changed,
+};
+
+static int __init init(void)
+{
+ int err = register_filesystem(&balloon_fs_type);
+ if (err)
+ goto out;
+
+ balloon_mnt = kern_mount(&balloon_fs_type);
+ if (IS_ERR(balloon_mnt)) {
+ err = PTR_ERR(balloon_mnt);
+ goto out_filesystem;
+ }
+
+ err = register_virtio_driver(&virtio_balloon_driver);
+ if (err)
+ goto out_filesystem;
+
+ goto out;
+
+out_filesystem:
+ unregister_filesystem(&balloon_fs_type);
+
+out:
+ return err;
+}
+
+static void __exit fini(void)
+{
+ if (balloon_mnt) {
+ unregister_filesystem(&balloon_fs_type);
+ balloon_mnt = NULL;
+ }
+ unregister_virtio_driver(&virtio_balloon_driver);
+}
+module_init(init);
+module_exit(fini);
+
+MODULE_DEVICE_TABLE(virtio, id_table);
+MODULE_DESCRIPTION("Virtio file (page cache-backed) balloon driver");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
index 652dc8b..2be9a02 100644
--- a/include/linux/virtio_balloon.h
+++ b/include/linux/virtio_balloon.h
@@ -41,6 +41,15 @@ struct virtio_balloon_config
__le32 num_pages;
/* Number of pages we've actually got in balloon. */
__le32 actual;
+#if defined(CONFIG_VIRTIO_FILEBALLOON) ||\
+ defined(CONFIG_VIRTIO_FILEBALLOON_MODULE)
+ /* Total pages on this system. */
+ __le32 pages_total;
+ /* Free pages on this system. */
+ __le32 pages_free;
+ /* If the device needs pages_total/pages_free updated. */
+ __le32 need_stats;
+#endif
};
#define VIRTIO_BALLOON_S_SWAP_IN 0 /* Amount of memory swapped in */
diff --git a/include/linux/virtio_ids.h b/include/linux/virtio_ids.h
index 7529b85..2f081d7 100644
--- a/include/linux/virtio_ids.h
+++ b/include/linux/virtio_ids.h
@@ -37,5 +37,6 @@
#define VIRTIO_ID_RPMSG 7 /* virtio remote processor messaging */
#define VIRTIO_ID_SCSI 8 /* virtio scsi */
#define VIRTIO_ID_9P 9 /* 9p virtio console */
+#define VIRTIO_ID_FILE_BALLOON 10 /* virtio file-backed balloon */
#endif /* _LINUX_VIRTIO_IDS_H */
--
1.7.7.3
^ permalink raw reply related
* Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Andi Kleen @ 2012-06-26 20:34 UTC (permalink / raw)
To: Mel Gorman
Cc: Rik van Riel, Rafael Aquini, Michael S. Tsirkin, linux-kernel,
virtualization, linux-mm, Andi Kleen
In-Reply-To: <20120626201513.GJ8103@csn.ul.ie>
> How is the compiler meant to optimise away "cond" if it's a function
> call?
Inlines can be optimized away. These tests are usually inlines.
> What did I miss? If nothing, then I will revert this particular change
> and Rafael will need to be sure his patch is not using VM_BUG_ON to call
> a function with side-effects.
Do you have an example where the code is actually different,
or are you just speculating?
FWIW for my config both generates the same code:
size vmlinux-andi-vmbug vmlinux-vmbug-nothing
text data bss dec hex filename
11809704 1457352 1159168 14426224 dc2070 vmlinux-andi-vmbug
11809704 1457352 1159168 14426224 dc2070 vmlinux-vmbug-nothing
-Andi
^ permalink raw reply
* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Rik van Riel @ 2012-06-26 20:40 UTC (permalink / raw)
To: Frank Swiderski
Cc: Andrea Arcangeli, Rafael Aquini, kvm, Michael S. Tsirkin,
linux-kernel, virtualization, Ying Han, mikew
In-Reply-To: <1340742778-11282-1-git-send-email-fes@google.com>
On 06/26/2012 04:32 PM, Frank Swiderski wrote:
> This implementation of a virtio balloon driver uses the page cache to
> "store" pages that have been released to the host. The communication
> (outside of target counts) is one way--the guest notifies the host when
> it adds a page to the page cache, allowing the host to madvise(2) with
> MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit
> (via the regular page reclaim). This means that inflating the balloon
> is similar to the existing balloon mechanism, but the deflate is
> different--it re-uses existing Linux kernel functionality to
> automatically reclaim.
>
> Signed-off-by: Frank Swiderski<fes@google.com>
It is a great idea, but how can this memory balancing
possibly work if someone uses memory cgroups inside a
guest?
Having said that, we currently do not have proper
memory reclaim balancing between cgroups at all, so
requiring that of this balloon driver would be
unreasonable.
The code looks good to me, my only worry is the
code duplication. We now have 5 balloon drivers,
for 4 hypervisors, all implementing everything
from scratch...
^ permalink raw reply
* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Frank Swiderski @ 2012-06-26 21:31 UTC (permalink / raw)
To: Rik van Riel
Cc: Andrea Arcangeli, Rafael Aquini, kvm, Michael S. Tsirkin,
linux-kernel, virtualization, Ying Han, mikew
In-Reply-To: <4FEA1E2E.4020806@redhat.com>
On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel <riel@redhat.com> wrote:
> On 06/26/2012 04:32 PM, Frank Swiderski wrote:
>>
>> This implementation of a virtio balloon driver uses the page cache to
>> "store" pages that have been released to the host. The communication
>> (outside of target counts) is one way--the guest notifies the host when
>> it adds a page to the page cache, allowing the host to madvise(2) with
>> MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit
>> (via the regular page reclaim). This means that inflating the balloon
>> is similar to the existing balloon mechanism, but the deflate is
>> different--it re-uses existing Linux kernel functionality to
>> automatically reclaim.
>>
>> Signed-off-by: Frank Swiderski<fes@google.com>
>
>
> It is a great idea, but how can this memory balancing
> possibly work if someone uses memory cgroups inside a
> guest?
Thanks and good point--this isn't something that I considered in the
implementation.
> Having said that, we currently do not have proper
> memory reclaim balancing between cgroups at all, so
> requiring that of this balloon driver would be
> unreasonable.
>
> The code looks good to me, my only worry is the
> code duplication. We now have 5 balloon drivers,
> for 4 hypervisors, all implementing everything
> from scratch...
Do you have any recommendations on this? I could (I think reasonably
so) modify the existing virtio_balloon.c and have it change behavior
based on a feature bit or other configuration. I'm not sure that
really addresses the root of what you're pointing out--it's still
adding a different implementation, but doing so as an extension of an
existing one.
fes
^ permalink raw reply
* Re: [PATCH 00/13] drivers: hv: kvp
From: Greg KH @ 2012-06-26 21:39 UTC (permalink / raw)
To: KY Srinivasan
Cc: apw@canonical.com, devel@linuxdriverproject.org,
virtualization@lists.osdl.org, ohering@suse.com,
linux-kernel@vger.kernel.org
In-Reply-To: <426367E2313C2449837CD2DE46E7EAF9155ED14D@SN2PRD0310MB382.namprd03.prod.outlook.com>
On Tue, Jun 26, 2012 at 02:29:49AM +0000, KY Srinivasan wrote:
>
>
> > -----Original Message-----
> > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > Sent: Friday, June 22, 2012 9:26 AM
> > To: KY Srinivasan
> > Cc: apw@canonical.com; devel@linuxdriverproject.org; ohering@suse.com;
> > linux-kernel@vger.kernel.org; virtualization@lists.osdl.org
> > Subject: Re: [PATCH 00/13] drivers: hv: kvp
> >
> > On Fri, Jun 22, 2012 at 01:06:53PM +0000, KY Srinivasan wrote:
> > > Are you still missing it; do you want me to resend the whole set?
> >
> > Nope, it showed up a few hours later, thanks. You really should get
> > that fixed...
>
> Greg, some additional testing with VM replication on Windows Server
> 2012 has revealed some issues with this patch-set. Please drop the
> patch set, I will fix the problem and resend it.
Thanks for letting me know, now dropped.
But, for the next round, I suggest you run this patchset by the netdev
developers, as I think you are doing things in an "odd" way for some of
the ip determination.
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Michael S. Tsirkin @ 2012-06-26 21:41 UTC (permalink / raw)
To: Frank Swiderski
Cc: Andrea Arcangeli, riel, kvm, linux-kernel, virtualization, mikew
In-Reply-To: <1340742778-11282-1-git-send-email-fes@google.com>
On Tue, Jun 26, 2012 at 01:32:58PM -0700, Frank Swiderski wrote:
> This implementation of a virtio balloon driver uses the page cache to
> "store" pages that have been released to the host. The communication
> (outside of target counts) is one way--the guest notifies the host when
> it adds a page to the page cache, allowing the host to madvise(2) with
> MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit
> (via the regular page reclaim). This means that inflating the balloon
> is similar to the existing balloon mechanism, but the deflate is
> different--it re-uses existing Linux kernel functionality to
> automatically reclaim.
>
> Signed-off-by: Frank Swiderski <fes@google.com>
I'm pondering this:
Should it really be a separate driver/device ID?
If it behaves the same from host POV, maybe it
should be up to the guest how to inflate/deflate
the balloon internally?
> ---
> drivers/virtio/Kconfig | 13 +
> drivers/virtio/Makefile | 1 +
> drivers/virtio/virtio_fileballoon.c | 636 +++++++++++++++++++++++++++++++++++
> include/linux/virtio_balloon.h | 9 +
> include/linux/virtio_ids.h | 1 +
> 5 files changed, 660 insertions(+), 0 deletions(-)
> create mode 100644 drivers/virtio/virtio_fileballoon.c
>
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index f38b17a..cffa2a7 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -35,6 +35,19 @@ config VIRTIO_BALLOON
>
> If unsure, say M.
>
> +config VIRTIO_FILEBALLOON
> + tristate "Virtio page cache-backed balloon driver"
> + select VIRTIO
> + select VIRTIO_RING
> + ---help---
> + This driver supports decreasing and automatically reclaiming the
> + memory within a guest VM. Unlike VIRTIO_BALLOON, this driver instead
> + tries to maintain a specific target balloon size using the page cache.
> + This allows the guest to implicitly deflate the balloon by flushing
> + pages from the cache and touching the page.
> +
> + If unsure, say N.
> +
> config VIRTIO_MMIO
> tristate "Platform bus driver for memory mapped virtio devices (EXPERIMENTAL)"
> depends on HAS_IOMEM && EXPERIMENTAL
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 5a4c63c..7ca0a3f 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -3,3 +3,4 @@ obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
> obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
> obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
> obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> +obj-$(CONFIG_VIRTIO_FILEBALLOON) += virtio_fileballoon.o
> diff --git a/drivers/virtio/virtio_fileballoon.c b/drivers/virtio/virtio_fileballoon.c
> new file mode 100644
> index 0000000..ff252ec
> --- /dev/null
> +++ b/drivers/virtio/virtio_fileballoon.c
> @@ -0,0 +1,636 @@
> +/* Virtio file (page cache-backed) balloon implementation, inspired by
> + * Dor Loar and Marcelo Tosatti's implementations, and based on Rusty Russel's
> + * implementation.
> + *
> + * This implementation of the virtio balloon driver re-uses the page cache to
> + * allow memory consumed by inflating the balloon to be reclaimed by linux. It
> + * creates and mounts a bare-bones filesystem containing a single inode. When
> + * the host requests the balloon to inflate, it does so by "reading" pages at
> + * offsets into the inode mapping's page_tree. The host is notified when the
> + * pages are added to the page_tree, allowing it (the host) to madvise(2) the
> + * corresponding host memory, reducing the RSS of the virtual machine. In this
> + * implementation, the host is only notified when a page is added to the
> + * balloon. Reclaim happens under the existing TTFP logic, which flushes unused
> + * pages in the page cache. If the host used MADV_DONTNEED, then when the guest
> + * uses the page, the zero page will be mapped in, allowing automatic (and fast,
> + * compared to requiring a host notification via a virtio queue to get memory
> + * back) reclaim.
> + *
> + * Copyright 2008 Rusty Russell IBM Corporation
> + * Copyright 2011 Frank Swiderski Google Inc
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +#include <linux/backing-dev.h>
> +#include <linux/delay.h>
> +#include <linux/file.h>
> +#include <linux/freezer.h>
> +#include <linux/fs.h>
> +#include <linux/jiffies.h>
> +#include <linux/kthread.h>
> +#include <linux/module.h>
> +#include <linux/mount.h>
> +#include <linux/pagemap.h>
> +#include <linux/slab.h>
> +#include <linux/swap.h>
> +#include <linux/virtio.h>
> +#include <linux/virtio_balloon.h>
> +#include <linux/writeback.h>
> +
> +#define VIRTBALLOON_PFN_ARRAY_SIZE 256
> +
> +struct virtio_balloon {
> + struct virtio_device *vdev;
> + struct virtqueue *inflate_vq;
> +
> + /* Where the ballooning thread waits for config to change. */
> + wait_queue_head_t config_change;
> +
> + /* The thread servicing the balloon. */
> + struct task_struct *thread;
> +
> + /* Waiting for host to ack the pages we released. */
> + struct completion acked;
> +
> + /* The array of pfns we tell the Host about. */
> + unsigned int num_pfns;
> + u32 pfns[VIRTBALLOON_PFN_ARRAY_SIZE];
> +
> + struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
> +
> + /* The last page offset read into the mapping's page_tree */
> + unsigned long last_scan_page_array;
> +
> + /* The last time a page was reclaimed */
> + unsigned long last_reclaim;
> +};
> +
> +/* Magic number used for the skeleton filesystem in the call to mount_pseudo */
> +#define BALLOONFS_MAGIC 0x42414c4c
> +
> +static struct virtio_device_id id_table[] = {
> + { VIRTIO_ID_FILE_BALLOON, VIRTIO_DEV_ANY_ID },
> + { 0 },
> +};
> +
> +/*
> + * The skeleton filesystem contains a single inode, held by the structure below.
> + * Using the containing structure below allows easy access to the struct
> + * virtio_balloon.
> + */
> +static struct balloon_inode {
> + struct inode inode;
> + struct virtio_balloon *vb;
> +} the_inode;
> +
> +/*
> + * balloon_alloc_inode is called when the single inode for the skeleton
> + * filesystem is created in init() with the call to new_inode.
> + */
> +static struct inode *balloon_alloc_inode(struct super_block *sb)
> +{
> + static bool already_inited;
> + /* We should only ever be called once! */
> + BUG_ON(already_inited);
> + already_inited = true;
> + inode_init_once(&the_inode.inode);
> + return &the_inode.inode;
> +}
> +
> +/* Noop implementation of destroy_inode. */
> +static void balloon_destroy_inode(struct inode *inode)
> +{
> +}
> +
> +static int balloon_sync_fs(struct super_block *sb, int wait)
> +{
> + return filemap_write_and_wait(the_inode.inode.i_mapping);
> +}
> +
> +static const struct super_operations balloonfs_ops = {
> + .alloc_inode = balloon_alloc_inode,
> + .destroy_inode = balloon_destroy_inode,
> + .sync_fs = balloon_sync_fs,
> +};
> +
> +static const struct dentry_operations balloonfs_dentry_operations = {
> +};
> +
> +/*
> + * balloonfs_writepage is called when linux needs to reclaim memory held using
> + * the balloonfs' page cache.
> + */
> +static int balloonfs_writepage(struct page *page, struct writeback_control *wbc)
> +{
> + the_inode.vb->last_reclaim = jiffies;
> + SetPageUptodate(page);
> + ClearPageDirty(page);
> + /*
> + * If the page isn't being flushed from the page allocator, go ahead and
> + * drop it from the page cache anyway.
> + */
> + if (!wbc->for_reclaim)
> + delete_from_page_cache(page);
> + unlock_page(page);
> + return 0;
> +}
> +
> +/* Nearly no-op implementation of readpage */
> +static int balloonfs_readpage(struct file *file, struct page *page)
> +{
> + SetPageUptodate(page);
> + unlock_page(page);
> + return 0;
> +}
> +
> +static const struct address_space_operations balloonfs_aops = {
> + .writepage = balloonfs_writepage,
> + .readpage = balloonfs_readpage
> +};
> +
> +static struct backing_dev_info balloonfs_backing_dev_info = {
> + .name = "balloonfs",
> + .ra_pages = 0,
> + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK
> +};
> +
> +static struct dentry *balloonfs_mount(struct file_system_type *fs_type,
> + int flags, const char *dev_name, void *data)
> +{
> + struct dentry *root;
> + struct inode *inode;
> + root = mount_pseudo(fs_type, "balloon:", &balloonfs_ops,
> + &balloonfs_dentry_operations, BALLOONFS_MAGIC);
> + inode = root->d_inode;
> + inode->i_mapping->a_ops = &balloonfs_aops;
> + mapping_set_gfp_mask(inode->i_mapping,
> + (GFP_HIGHUSER | __GFP_NOMEMALLOC));
> + inode->i_mapping->backing_dev_info = &balloonfs_backing_dev_info;
> + return root;
> +}
> +
> +/* The single mounted skeleton filesystem */
> +static struct vfsmount *balloon_mnt __read_mostly;
> +
> +static struct file_system_type balloon_fs_type = {
> + .name = "balloonfs",
> + .mount = balloonfs_mount,
> + .kill_sb = kill_anon_super,
> +};
> +
> +/* Acknowledges a message from the specified virtqueue. */
> +static void balloon_ack(struct virtqueue *vq)
> +{
> + struct virtio_balloon *vb;
> + unsigned int len;
> +
> + vb = virtqueue_get_buf(vq, &len);
> + if (vb)
> + complete(&vb->acked);
> +}
> +
> +/*
> + * Scans the page_tree for the inode's mapping, looking for an offset that is
> + * currently empty, returning that index (or 0 if it could not fill the
> + * request).
> + */
> +static unsigned long find_available_inode_page(struct virtio_balloon *vb)
> +{
> + unsigned long radix_index, index, max_scan;
> + struct address_space *mapping = the_inode.inode.i_mapping;
> +
> + /*
> + * This function is a serialized call (only happens on the free-to-host
> + * thread), so no locking is necessary here.
> + */
> + index = vb->last_scan_page_array;
> + max_scan = totalram_pages - vb->last_scan_page_array;
> +
> + /*
> + * Scan starting at the last scanned offset, then wrap around if
> + * necessary.
> + */
> + if (index == 0)
> + index = 1;
> + rcu_read_lock();
> + radix_index = radix_tree_next_hole(&mapping->page_tree,
> + index, max_scan);
> + rcu_read_unlock();
> + /*
> + * If we hit the end of the tree, wrap and search up to the original
> + * index.
> + */
> + if (radix_index - index >= max_scan) {
> + if (index != 1) {
> + rcu_read_lock();
> + radix_index = radix_tree_next_hole(&mapping->page_tree,
> + 1, index);
> + rcu_read_unlock();
> + if (radix_index - 1 >= index)
> + radix_index = 0;
> + } else {
> + radix_index = 0;
> + }
> + }
> + vb->last_scan_page_array = radix_index;
> +
> + return radix_index;
> +}
> +
> +/* Notifies the host of pages in the specified virtqueue. */
> +static int tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> +{
> + int err;
> + struct scatterlist sg;
> +
> + sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> +
> + init_completion(&vb->acked);
> +
> + /* We should always be able to add one buffer to an empty queue. */
> + err = virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL);
> + if (err < 0)
> + return err;
> + virtqueue_kick(vq);
> +
> + /* When host has read buffer, this completes via balloon_ack */
> + wait_for_completion(&vb->acked);
> + return err;
> +}
> +
> +static void fill_balloon(struct virtio_balloon *vb, size_t num)
> +{
> + int err;
> +
> + /* We can only do one array worth at a time. */
> + num = min(num, ARRAY_SIZE(vb->pfns));
> +
> + for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns++) {
> + struct page *page;
> + unsigned long inode_pfn = find_available_inode_page(vb);
> + /* Should always be able to find a page. */
> + BUG_ON(!inode_pfn);
> + page = read_mapping_page(the_inode.inode.i_mapping, inode_pfn,
> + NULL);
> + if (IS_ERR(page)) {
> + if (printk_ratelimit())
> + dev_printk(KERN_INFO, &vb->vdev->dev,
> + "Out of puff! Can't get %zu pages\n",
> + num);
> + break;
> + }
> +
> + /* Set the page to be dirty */
> + set_page_dirty(page);
> +
> + vb->pfns[vb->num_pfns] = page_to_pfn(page);
> + }
> +
> + /* Didn't get any? Oh well. */
> + if (vb->num_pfns == 0)
> + return;
> +
> + /* Notify the host of the pages we just added to the page_tree. */
> + err = tell_host(vb, vb->inflate_vq);
> +
> + for (; vb->num_pfns != 0; vb->num_pfns--) {
> + struct page *page = pfn_to_page(vb->pfns[vb->num_pfns - 1]);
> + /*
> + * Release our refcount on the page so that it can be reclaimed
> + * when necessary.
> + */
> + page_cache_release(page);
> + }
> + __mark_inode_dirty(&the_inode.inode, I_DIRTY_PAGES);
> +}
> +
> +static inline void update_stat(struct virtio_balloon *vb, int idx,
> + u64 val)
> +{
> + BUG_ON(idx >= VIRTIO_BALLOON_S_NR);
> + vb->stats[idx].tag = idx;
> + vb->stats[idx].val = val;
> +}
> +
> +#define pages_to_bytes(x) ((u64)(x) << PAGE_SHIFT)
> +
> +static inline u32 config_pages(struct virtio_balloon *vb);
> +static void update_balloon_stats(struct virtio_balloon *vb)
> +{
> + unsigned long events[NR_VM_EVENT_ITEMS];
> + struct sysinfo i;
> +
> + all_vm_events(events);
> + si_meminfo(&i);
> +
> + update_stat(vb, VIRTIO_BALLOON_S_SWAP_IN,
> + pages_to_bytes(events[PSWPIN]));
> + update_stat(vb, VIRTIO_BALLOON_S_SWAP_OUT,
> + pages_to_bytes(events[PSWPOUT]));
> + update_stat(vb, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
> + update_stat(vb, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
> +
> + /* Total and Free Mem */
> + update_stat(vb, VIRTIO_BALLOON_S_MEMFREE, pages_to_bytes(i.freeram));
> + update_stat(vb, VIRTIO_BALLOON_S_MEMTOT, pages_to_bytes(i.totalram));
> +}
> +
> +static void virtballoon_changed(struct virtio_device *vdev)
> +{
> + struct virtio_balloon *vb = vdev->priv;
> +
> + wake_up(&vb->config_change);
> +}
> +
> +static inline bool config_need_stats(struct virtio_balloon *vb)
> +{
> + u32 v = 0;
> +
> + vb->vdev->config->get(vb->vdev,
> + offsetof(struct virtio_balloon_config,
> + need_stats),
> + &v, sizeof(v));
> + return (v != 0);
> +}
> +
> +static inline u32 config_pages(struct virtio_balloon *vb)
> +{
> + u32 v = 0;
> +
> + vb->vdev->config->get(vb->vdev,
> + offsetof(struct virtio_balloon_config, num_pages),
> + &v, sizeof(v));
> + return v;
> +}
> +
> +static inline s64 towards_target(struct virtio_balloon *vb)
> +{
> + struct address_space *mapping = the_inode.inode.i_mapping;
> + u32 v = config_pages(vb);
> +
> + return (s64)v - (mapping ? mapping->nrpages : 0);
> +}
> +
> +static void update_balloon_size(struct virtio_balloon *vb)
> +{
> + struct address_space *mapping = the_inode.inode.i_mapping;
> + __le32 actual = cpu_to_le32((mapping ? mapping->nrpages : 0));
> +
> + vb->vdev->config->set(vb->vdev,
> + offsetof(struct virtio_balloon_config, actual),
> + &actual, sizeof(actual));
> +}
> +
> +static void update_free_and_total(struct virtio_balloon *vb)
> +{
> + struct sysinfo i;
> + u32 value;
> +
> + si_meminfo(&i);
> +
> + update_balloon_stats(vb);
> + value = i.totalram;
> + vb->vdev->config->set(vb->vdev,
> + offsetof(struct virtio_balloon_config,
> + pages_total),
> + &value, sizeof(value));
> + value = i.freeram;
> + vb->vdev->config->set(vb->vdev,
> + offsetof(struct virtio_balloon_config,
> + pages_free),
> + &value, sizeof(value));
> + value = 0;
> + vb->vdev->config->set(vb->vdev,
> + offsetof(struct virtio_balloon_config,
> + need_stats),
> + &value, sizeof(value));
> +}
> +
> +static int balloon(void *_vballoon)
> +{
> + struct virtio_balloon *vb = _vballoon;
> +
> + set_freezable();
> + while (!kthread_should_stop()) {
> + s64 diff;
> + try_to_freeze();
> + wait_event_interruptible(vb->config_change,
> + (diff = towards_target(vb)) > 0
> + || config_need_stats(vb)
> + || kthread_should_stop()
> + || freezing(current));
> + if (config_need_stats(vb))
> + update_free_and_total(vb);
> + if (diff > 0) {
> + unsigned long reclaim_time = vb->last_reclaim + 2 * HZ;
> + /*
> + * Don't fill the balloon if a page reclaim happened in
> + * the past 2 seconds.
> + */
> + if (time_after_eq(reclaim_time, jiffies)) {
> + /* Inflating too fast--sleep and skip. */
> + msleep(500);
> + } else {
> + fill_balloon(vb, diff);
> + }
> + } else if (diff < 0 && config_pages(vb) == 0) {
> + /*
> + * Here we are specifically looking to detect the case
> + * where there are pages in the page cache, but the
> + * device wants us to go to 0. This is used in save/
> + * restore since the host device doesn't keep track of
> + * PFNs, and must flush the page cache on restore
> + * (which loses the context of the original device
> + * instance). However, we still suggest syncing the
> + * diff so that we can get within the target range.
> + */
> + s64 nr_to_write =
> + (!config_pages(vb) ? LONG_MAX : -diff);
> + struct writeback_control wbc = {
> + .sync_mode = WB_SYNC_ALL,
> + .nr_to_write = nr_to_write,
> + .range_start = 0,
> + .range_end = LLONG_MAX,
> + };
> + sync_inode(&the_inode.inode, &wbc);
> + }
> + update_balloon_size(vb);
> + }
> + return 0;
> +}
> +
> +static ssize_t virtballoon_attr_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf);
> +
> +static DEVICE_ATTR(total_memory, 0644,
> + virtballoon_attr_show, NULL);
> +
> +static DEVICE_ATTR(free_memory, 0644,
> + virtballoon_attr_show, NULL);
> +
> +static DEVICE_ATTR(target_pages, 0644,
> + virtballoon_attr_show, NULL);
> +
> +static DEVICE_ATTR(actual_pages, 0644,
> + virtballoon_attr_show, NULL);
> +
> +static struct attribute *virtballoon_attrs[] = {
> + &dev_attr_total_memory.attr,
> + &dev_attr_free_memory.attr,
> + &dev_attr_target_pages.attr,
> + &dev_attr_actual_pages.attr,
> + NULL
> +};
> +static struct attribute_group virtballoon_attr_group = {
> + .name = "virtballoon",
> + .attrs = virtballoon_attrs,
> +};
> +
> +static ssize_t virtballoon_attr_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct address_space *mapping = the_inode.inode.i_mapping;
> + struct virtio_device *vdev = container_of(dev, struct virtio_device,
> + dev);
> + struct virtio_balloon *vb = vdev->priv;
> + unsigned long long value = 0;
> + if (attr == &dev_attr_total_memory)
> + value = vb->stats[VIRTIO_BALLOON_S_MEMTOT].val;
> + else if (attr == &dev_attr_free_memory)
> + value = vb->stats[VIRTIO_BALLOON_S_MEMFREE].val;
> + else if (attr == &dev_attr_target_pages)
> + value = config_pages(vb);
> + else if (attr == &dev_attr_actual_pages)
> + value = cpu_to_le32((mapping ? mapping->nrpages : 0));
> + return sprintf(buf, "%llu\n", value);
> +}
> +
> +static int virtballoon_probe(struct virtio_device *vdev)
> +{
> + struct virtio_balloon *vb;
> + struct virtqueue *vq[1];
> + vq_callback_t *callback = balloon_ack;
> + const char *name = "inflate";
> + int err;
> +
> + vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL);
> + if (!vb) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + init_waitqueue_head(&vb->config_change);
> + vb->vdev = vdev;
> +
> + /* We use one virtqueue: inflate */
> + err = vdev->config->find_vqs(vdev, 1, vq, &callback, &name);
> + if (err)
> + goto out_free_vb;
> +
> + vb->inflate_vq = vq[0];
> +
> + err = sysfs_create_group(&vdev->dev.kobj, &virtballoon_attr_group);
> + if (err) {
> + pr_err("Failed to create virtballoon sysfs node\n");
> + goto out_free_vb;
> + }
> +
> + vb->last_scan_page_array = 0;
> + vb->last_reclaim = 0;
> + the_inode.vb = vb;
> +
> + vb->thread = kthread_run(balloon, vb, "vballoon");
> + if (IS_ERR(vb->thread)) {
> + err = PTR_ERR(vb->thread);
> + goto out_del_vqs;
> + }
> +
> + return 0;
> +
> +out_del_vqs:
> + vdev->config->del_vqs(vdev);
> +out_free_vb:
> + kfree(vb);
> +out:
> + return err;
> +}
> +
> +static void __devexit virtballoon_remove(struct virtio_device *vdev)
> +{
> + struct virtio_balloon *vb = vdev->priv;
> +
> + kthread_stop(vb->thread);
> +
> + sysfs_remove_group(&vdev->dev.kobj, &virtballoon_attr_group);
> +
> + /* Now we reset the device so we can clean up the queues. */
> + vdev->config->reset(vdev);
> +
> + vdev->config->del_vqs(vdev);
> + kfree(vb);
> +}
> +
> +static struct virtio_driver virtio_balloon_driver = {
> + .feature_table = NULL,
> + .feature_table_size = 0,
> + .driver.name = KBUILD_MODNAME,
> + .driver.owner = THIS_MODULE,
> + .id_table = id_table,
> + .probe = virtballoon_probe,
> + .remove = __devexit_p(virtballoon_remove),
> + .config_changed = virtballoon_changed,
> +};
> +
> +static int __init init(void)
> +{
> + int err = register_filesystem(&balloon_fs_type);
> + if (err)
> + goto out;
> +
> + balloon_mnt = kern_mount(&balloon_fs_type);
> + if (IS_ERR(balloon_mnt)) {
> + err = PTR_ERR(balloon_mnt);
> + goto out_filesystem;
> + }
> +
> + err = register_virtio_driver(&virtio_balloon_driver);
> + if (err)
> + goto out_filesystem;
> +
> + goto out;
> +
> +out_filesystem:
> + unregister_filesystem(&balloon_fs_type);
> +
> +out:
> + return err;
> +}
> +
> +static void __exit fini(void)
> +{
> + if (balloon_mnt) {
> + unregister_filesystem(&balloon_fs_type);
> + balloon_mnt = NULL;
> + }
> + unregister_virtio_driver(&virtio_balloon_driver);
> +}
> +module_init(init);
> +module_exit(fini);
> +
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio file (page cache-backed) balloon driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
> index 652dc8b..2be9a02 100644
> --- a/include/linux/virtio_balloon.h
> +++ b/include/linux/virtio_balloon.h
> @@ -41,6 +41,15 @@ struct virtio_balloon_config
> __le32 num_pages;
> /* Number of pages we've actually got in balloon. */
> __le32 actual;
> +#if defined(CONFIG_VIRTIO_FILEBALLOON) ||\
> + defined(CONFIG_VIRTIO_FILEBALLOON_MODULE)
> + /* Total pages on this system. */
> + __le32 pages_total;
> + /* Free pages on this system. */
> + __le32 pages_free;
> + /* If the device needs pages_total/pages_free updated. */
> + __le32 need_stats;
> +#endif
> };
>
> #define VIRTIO_BALLOON_S_SWAP_IN 0 /* Amount of memory swapped in */
> diff --git a/include/linux/virtio_ids.h b/include/linux/virtio_ids.h
> index 7529b85..2f081d7 100644
> --- a/include/linux/virtio_ids.h
> +++ b/include/linux/virtio_ids.h
> @@ -37,5 +37,6 @@
> #define VIRTIO_ID_RPMSG 7 /* virtio remote processor messaging */
> #define VIRTIO_ID_SCSI 8 /* virtio scsi */
> #define VIRTIO_ID_9P 9 /* 9p virtio console */
> +#define VIRTIO_ID_FILE_BALLOON 10 /* virtio file-backed balloon */
>
> #endif /* _LINUX_VIRTIO_IDS_H */
> --
> 1.7.7.3
^ permalink raw reply
* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Rik van Riel @ 2012-06-26 21:45 UTC (permalink / raw)
To: Frank Swiderski
Cc: Andrea Arcangeli, Rafael Aquini, kvm, Michael S. Tsirkin,
linux-kernel, virtualization, Ying Han, mikew
In-Reply-To: <CAK+C7kWKtoYC=1fBQ45RTFfNc3p2PwZbLMOefMTtzOfsqe_k8w@mail.gmail.com>
On 06/26/2012 05:31 PM, Frank Swiderski wrote:
> On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel<riel@redhat.com> wrote:
>> The code looks good to me, my only worry is the
>> code duplication. We now have 5 balloon drivers,
>> for 4 hypervisors, all implementing everything
>> from scratch...
>
> Do you have any recommendations on this? I could (I think reasonably
> so) modify the existing virtio_balloon.c and have it change behavior
> based on a feature bit or other configuration. I'm not sure that
> really addresses the root of what you're pointing out--it's still
> adding a different implementation, but doing so as an extension of an
> existing one.
Ideally, I believe we would have two balloon
top parts in a guest (one classical balloon,
one on the LRU), and four bottom parts (kvm,
xen, vmware & s390).
That way the virt specific bits of a balloon
driver would be essentially a ->balloon_page
and ->release_page callback for pages, as well
as methods to communicate with the host.
All the management of pages, including stuff
like putting them on the LRU, or isolating
them for migration, would be done with the
same common code, regardless of what virt
software we are running on.
Of course, that is a substantial amount of
work and I feel it would be unreasonable to
block anyone's code on that kind of thing
(especially considering that your code is good),
but I do believe the explosion of balloon
code is a little worrying.
^ permalink raw reply
* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Michael S. Tsirkin @ 2012-06-26 21:47 UTC (permalink / raw)
To: Frank Swiderski
Cc: Andrea Arcangeli, Rik van Riel, Rafael Aquini, kvm, linux-kernel,
virtualization, Ying Han, mikew
In-Reply-To: <CAK+C7kWKtoYC=1fBQ45RTFfNc3p2PwZbLMOefMTtzOfsqe_k8w@mail.gmail.com>
On Tue, Jun 26, 2012 at 02:31:26PM -0700, Frank Swiderski wrote:
> On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel <riel@redhat.com> wrote:
> > On 06/26/2012 04:32 PM, Frank Swiderski wrote:
> >>
> >> This implementation of a virtio balloon driver uses the page cache to
> >> "store" pages that have been released to the host. The communication
> >> (outside of target counts) is one way--the guest notifies the host when
> >> it adds a page to the page cache, allowing the host to madvise(2) with
> >> MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit
> >> (via the regular page reclaim). This means that inflating the balloon
> >> is similar to the existing balloon mechanism, but the deflate is
> >> different--it re-uses existing Linux kernel functionality to
> >> automatically reclaim.
> >>
> >> Signed-off-by: Frank Swiderski<fes@google.com>
> >
> >
> > It is a great idea, but how can this memory balancing
> > possibly work if someone uses memory cgroups inside a
> > guest?
>
> Thanks and good point--this isn't something that I considered in the
> implementation.
>
> > Having said that, we currently do not have proper
> > memory reclaim balancing between cgroups at all, so
> > requiring that of this balloon driver would be
> > unreasonable.
> >
> > The code looks good to me, my only worry is the
> > code duplication. We now have 5 balloon drivers,
> > for 4 hypervisors, all implementing everything
> > from scratch...
>
> Do you have any recommendations on this? I could (I think reasonably
> so) modify the existing virtio_balloon.c and have it change behavior
> based on a feature bit or other configuration. I'm not sure that
> really addresses the root of what you're pointing out--it's still
> adding a different implementation, but doing so as an extension of an
> existing one.
>
> fes
Let's assume it's a feature bit: how would you
formulate what the feature does *from host point of view*?
--
MST
^ permalink raw reply
* Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Rafael Aquini @ 2012-06-26 22:01 UTC (permalink / raw)
To: Mel Gorman
Cc: Rik van Riel, Michael S. Tsirkin, linux-kernel, virtualization,
linux-mm, Andi Kleen
In-Reply-To: <20120626101729.GF8103@csn.ul.ie>
Mel,
First and foremost, thank you for taking the time to review these bits and
provide such valuable feedback.
On Tue, Jun 26, 2012 at 11:17:29AM +0100, Mel Gorman wrote:
> > +/* return 1 if page is part of a guest's memory balloon, 0 otherwise */
> > +static inline int PageBalloon(struct page *page)
> > +{
> > + return is_balloon_page(page);
> > +}
>
> bool
>
> Why is there both is_balloon_page and PageBalloon?
>
> is_ballon_page is so simple it should just be a static inline here
>
> extern struct address_space *balloon_mapping;
> static inline bool is_balloon_page(page)
> {
> return page->mapping == balloon_mapping;
> }
>
I was thinking about sustain the same syntax other page tests utilize,
but I rather stick to your suggestion on this one.
> > #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> > @@ -312,6 +313,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > continue;
> > }
> >
> > + /*
> > + * For ballooned pages, we need to isolate them before testing
> > + * for PageLRU, as well as skip the LRU page isolation steps.
> > + */
>
> This says what, but not why.
>
> I didn't check the exact mechanics of a balloon page but I expect it's that
> balloon pages are not on the LRU. If they are on the LRU, that's pretty dumb.
>
>
> /*
> * Balloon pages can be migrated but are not on the LRU. Isolate
> * them before LRU checks.
> */
>
>
> It would be nicer to do this without gotos
>
> /*
> * It is possible to migrate LRU pages and balloon pages. Skip
> * any other type of page
> */
> if (is_balloon_page(page)) {
> if (!isolate_balloon_page(page))
> continue;
> } else if (PageLRU(page)) {
> ....
> }
>
> You will need to shuffle things around a little to make it work properly
> but if we handle other page types in the future it will be neater
> overall.
>
I'm glad you've put things this way on this one. Despite I was thinking on doing it
the way you suggested, I took the goto approach because I was afraid of doing
otherwise could be considered as an unnecessary radical surgery on established code.
Will do it, certainly.
> > +struct address_space *balloon_mapping;
> > +EXPORT_SYMBOL(balloon_mapping);
> > +
>
> EXPORT_SYMBOL_GPL?
>
> I don't mind how it is exported as such. I'm idly curious if there are
> external closed modules that use the driver.
>
To be honest with you, that was picked with no particular case in mind. And, since
you've raised this question, I'm also curious. However, after giving a thought
on your feedback, I believe EXPORT_SYMBOL_GPL suits far better.
> > +/* ballooned page id check */
> > +int is_balloon_page(struct page *page)
> > +{
> > + struct address_space *mapping = page->mapping;
> > + if (mapping == balloon_mapping)
> > + return 1;
> > + return 0;
> > +}
> > +
> > +/* __isolate_lru_page() counterpart for a ballooned page */
> > +int isolate_balloon_page(struct page *page)
> > +{
> > + struct address_space *mapping = page->mapping;
>
> This is a publicly visible function and while your current usage looks
> correct it would not hurt to do something like this;
>
> if (WARN_ON(!is_page_ballon(page))
> return 0;
>
Excellent point!
> > + if (mapping->a_ops->invalidatepage) {
> > + /*
> > + * We can race against move_to_new_page() and stumble across a
> > + * locked 'newpage'. If we succeed on isolating it, the result
> > + * tends to be disastrous. So, we sanely skip PageLocked here.
> > + */
> > + if (likely(!PageLocked(page) && get_page_unless_zero(page))) {
>
> But the page can get locked after this point.
>
> Would it not be better to do a trylock_page() and unlock the page on
> exit after the isolation completes?
>
Far better, for sure! thanks (again)
> > @@ -78,7 +78,10 @@ void putback_lru_pages(struct list_head *l)
> > list_del(&page->lru);
> > dec_zone_page_state(page, NR_ISOLATED_ANON +
> > page_is_file_cache(page));
> > - putback_lru_page(page);
> > + if (unlikely(PageBalloon(page)))
> > + VM_BUG_ON(!putback_balloon_page(page));
>
> Why not BUG_ON?
>
> What shocked me actually is that VM_BUG_ON code is executed on
> !CONFIG_DEBUG_VM builds and has been since 2.6.36 due to commit [4e60c86bd:
> gcc-4.6: mm: fix unused but set warnings]. I thought the whole point of
> VM_BUG_ON was to avoid expensive and usually unnecessary checks. Andi,
> was this deliberate?
>
> Either way, you always want to call putback_ballon_page() so BUG_ON is
> more appropriate although gracefully recovering from the situation and a
> WARN would be better.
>
Shame on me!
I was lazy enough to not carefully read VM_BUG_ON's definition and get its
original purpose. Will change it, for sure.
Once more, thank you!
^ permalink raw reply
* RE: [PATCH 00/13] drivers: hv: kvp
From: KY Srinivasan @ 2012-06-26 22:11 UTC (permalink / raw)
To: Greg KH
Cc: apw@canonical.com, devel@linuxdriverproject.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org
In-Reply-To: <20120626213954.GA4840@kroah.com>
> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Tuesday, June 26, 2012 5:40 PM
> To: KY Srinivasan
> Cc: apw@canonical.com; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; ohering@suse.com; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 00/13] drivers: hv: kvp
>
> On Tue, Jun 26, 2012 at 02:29:49AM +0000, KY Srinivasan wrote:
> >
> >
> > > -----Original Message-----
> > > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > > Sent: Friday, June 22, 2012 9:26 AM
> > > To: KY Srinivasan
> > > Cc: apw@canonical.com; devel@linuxdriverproject.org; ohering@suse.com;
> > > linux-kernel@vger.kernel.org; virtualization@lists.osdl.org
> > > Subject: Re: [PATCH 00/13] drivers: hv: kvp
> > >
> > > On Fri, Jun 22, 2012 at 01:06:53PM +0000, KY Srinivasan wrote:
> > > > Are you still missing it; do you want me to resend the whole set?
> > >
> > > Nope, it showed up a few hours later, thanks. You really should get
> > > that fixed...
> >
> > Greg, some additional testing with VM replication on Windows Server
> > 2012 has revealed some issues with this patch-set. Please drop the
> > patch set, I will fix the problem and resend it.
>
> Thanks for letting me know, now dropped.
>
> But, for the next round, I suggest you run this patchset by the netdev
> developers, as I think you are doing things in an "odd" way for some of
> the ip determination.
Will do. Just curious; can you be more explicit on what part appeared "odd"
to you.
Regards,
K. Y
^ permalink raw reply
* Re: [PATCH 00/13] drivers: hv: kvp
From: Greg KH @ 2012-06-26 22:22 UTC (permalink / raw)
To: KY Srinivasan
Cc: apw@canonical.com, devel@linuxdriverproject.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org
In-Reply-To: <426367E2313C2449837CD2DE46E7EAF9155ED64A@SN2PRD0310MB382.namprd03.prod.outlook.com>
On Tue, Jun 26, 2012 at 10:11:55PM +0000, KY Srinivasan wrote:
>
>
> > -----Original Message-----
> > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > Sent: Tuesday, June 26, 2012 5:40 PM
> > To: KY Srinivasan
> > Cc: apw@canonical.com; devel@linuxdriverproject.org;
> > virtualization@lists.osdl.org; ohering@suse.com; linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH 00/13] drivers: hv: kvp
> >
> > On Tue, Jun 26, 2012 at 02:29:49AM +0000, KY Srinivasan wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > > > Sent: Friday, June 22, 2012 9:26 AM
> > > > To: KY Srinivasan
> > > > Cc: apw@canonical.com; devel@linuxdriverproject.org; ohering@suse.com;
> > > > linux-kernel@vger.kernel.org; virtualization@lists.osdl.org
> > > > Subject: Re: [PATCH 00/13] drivers: hv: kvp
> > > >
> > > > On Fri, Jun 22, 2012 at 01:06:53PM +0000, KY Srinivasan wrote:
> > > > > Are you still missing it; do you want me to resend the whole set?
> > > >
> > > > Nope, it showed up a few hours later, thanks. You really should get
> > > > that fixed...
> > >
> > > Greg, some additional testing with VM replication on Windows Server
> > > 2012 has revealed some issues with this patch-set. Please drop the
> > > patch set, I will fix the problem and resend it.
> >
> > Thanks for letting me know, now dropped.
> >
> > But, for the next round, I suggest you run this patchset by the netdev
> > developers, as I think you are doing things in an "odd" way for some of
> > the ip determination.
>
> Will do. Just curious; can you be more explicit on what part appeared "odd"
> to you.
The fact that it was Red Hat specific was the main part, this should be
done in a standard way, with standard tools, right?
greg k-h
^ permalink raw reply
* RE: [PATCH 00/13] drivers: hv: kvp
From: KY Srinivasan @ 2012-06-26 22:28 UTC (permalink / raw)
To: Greg KH
Cc: apw@canonical.com, devel@linuxdriverproject.org,
virtualization@lists.osdl.org, ohering@suse.com,
linux-kernel@vger.kernel.org
In-Reply-To: <20120626222205.GA5948@kroah.com>
> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Tuesday, June 26, 2012 6:22 PM
> To: KY Srinivasan
> Cc: apw@canonical.com; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; ohering@suse.com; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 00/13] drivers: hv: kvp
>
> On Tue, Jun 26, 2012 at 10:11:55PM +0000, KY Srinivasan wrote:
> >
> >
> > > -----Original Message-----
> > > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > > Sent: Tuesday, June 26, 2012 5:40 PM
> > > To: KY Srinivasan
> > > Cc: apw@canonical.com; devel@linuxdriverproject.org;
> > > virtualization@lists.osdl.org; ohering@suse.com; linux-
> kernel@vger.kernel.org
> > > Subject: Re: [PATCH 00/13] drivers: hv: kvp
> > >
> > > On Tue, Jun 26, 2012 at 02:29:49AM +0000, KY Srinivasan wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > > > > Sent: Friday, June 22, 2012 9:26 AM
> > > > > To: KY Srinivasan
> > > > > Cc: apw@canonical.com; devel@linuxdriverproject.org;
> ohering@suse.com;
> > > > > linux-kernel@vger.kernel.org; virtualization@lists.osdl.org
> > > > > Subject: Re: [PATCH 00/13] drivers: hv: kvp
> > > > >
> > > > > On Fri, Jun 22, 2012 at 01:06:53PM +0000, KY Srinivasan wrote:
> > > > > > Are you still missing it; do you want me to resend the whole set?
> > > > >
> > > > > Nope, it showed up a few hours later, thanks. You really should get
> > > > > that fixed...
> > > >
> > > > Greg, some additional testing with VM replication on Windows Server
> > > > 2012 has revealed some issues with this patch-set. Please drop the
> > > > patch set, I will fix the problem and resend it.
> > >
> > > Thanks for letting me know, now dropped.
> > >
> > > But, for the next round, I suggest you run this patchset by the netdev
> > > developers, as I think you are doing things in an "odd" way for some of
> > > the ip determination.
> >
> > Will do. Just curious; can you be more explicit on what part appeared "odd"
> > to you.
>
> The fact that it was Red Hat specific was the main part, this should be
> done in a standard way, with standard tools, right?
The reason I asked this question was to make sure I address these issues in addition
to whatever I am debugging now. I use the standard tools and calls to retrieve all the
IP configuration. As I look at each distribution the files they keep persistent IP configuration
Information is different and that is the reason I chose to start with RedHat. If there is a
standard way to store the configuration, I will do that.
Regards,
K. Y
>
> greg k-h
>
>
^ permalink raw reply
* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Frank Swiderski @ 2012-06-26 23:21 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Andrea Arcangeli, Rik van Riel, Rafael Aquini, kvm, linux-kernel,
virtualization, Ying Han, mikew
In-Reply-To: <20120626214709.GA15406@redhat.com>
On Tue, Jun 26, 2012 at 2:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Jun 26, 2012 at 02:31:26PM -0700, Frank Swiderski wrote:
>> On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel <riel@redhat.com> wrote:
>> > On 06/26/2012 04:32 PM, Frank Swiderski wrote:
>> >>
>> >> This implementation of a virtio balloon driver uses the page cache to
>> >> "store" pages that have been released to the host. The communication
>> >> (outside of target counts) is one way--the guest notifies the host when
>> >> it adds a page to the page cache, allowing the host to madvise(2) with
>> >> MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit
>> >> (via the regular page reclaim). This means that inflating the balloon
>> >> is similar to the existing balloon mechanism, but the deflate is
>> >> different--it re-uses existing Linux kernel functionality to
>> >> automatically reclaim.
>> >>
>> >> Signed-off-by: Frank Swiderski<fes@google.com>
>> >
>> >
>> > It is a great idea, but how can this memory balancing
>> > possibly work if someone uses memory cgroups inside a
>> > guest?
>>
>> Thanks and good point--this isn't something that I considered in the
>> implementation.
>>
>> > Having said that, we currently do not have proper
>> > memory reclaim balancing between cgroups at all, so
>> > requiring that of this balloon driver would be
>> > unreasonable.
>> >
>> > The code looks good to me, my only worry is the
>> > code duplication. We now have 5 balloon drivers,
>> > for 4 hypervisors, all implementing everything
>> > from scratch...
>>
>> Do you have any recommendations on this? I could (I think reasonably
>> so) modify the existing virtio_balloon.c and have it change behavior
>> based on a feature bit or other configuration. I'm not sure that
>> really addresses the root of what you're pointing out--it's still
>> adding a different implementation, but doing so as an extension of an
>> existing one.
>>
>> fes
>
> Let's assume it's a feature bit: how would you
> formulate what the feature does *from host point of view*?
>
> --
> MST
In this implementation, the host doesn't keep track of pages in the
balloon, as there is no explicit deflate path. The host device for
this implementation should merely, for example, MADV_DONTNEED on the
pages sent in an inflate. Thus, the inflate becomes a notification
that the guest doesn't need those pages mapped in, but that they
should be available if the guest touches them. In that sense, it's
not a rigid shrink of guest memory. I'm not sure what I'd call the
feature bit though.
Was that the question you were asking, or did I misread?
fes
^ permalink raw reply
* Re: [PATCH 00/13] drivers: hv: kvp
From: Greg KH @ 2012-06-26 23:44 UTC (permalink / raw)
To: KY Srinivasan
Cc: apw@canonical.com, devel@linuxdriverproject.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org
In-Reply-To: <426367E2313C2449837CD2DE46E7EAF9155ED68D@SN2PRD0310MB382.namprd03.prod.outlook.com>
On Tue, Jun 26, 2012 at 10:28:48PM +0000, KY Srinivasan wrote:
> > The fact that it was Red Hat specific was the main part, this should be
> > done in a standard way, with standard tools, right?
>
> The reason I asked this question was to make sure I address these
> issues in addition to whatever I am debugging now. I use the standard
> tools and calls to retrieve all the IP configuration. As I look at
> each distribution the files they keep persistent IP configuration
> Information is different and that is the reason I chose to start with
> RedHat. If there is a standard way to store the configuration, I will
> do that.
You shouldn't be looking at any files, but rather calling programs to
get the information. Think about systems with NetworkManager, and not
just the older Red Hat static scripts? What about Gentoo and Ubuntu and
Arch and LFS and Slackware and SLES and Fedora? You are quickly going
to go crazy if you are going to be mucking around looking in files for
things.
What exactly are you trying to do here? That might be the better place
to start with instead of the implementation you have created.
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Konrad Rzeszutek Wilk @ 2012-06-26 23:57 UTC (permalink / raw)
To: Rafael Aquini
Cc: Rik van Riel, Michael S. Tsirkin, linux-kernel, virtualization,
linux-mm
In-Reply-To: <7f83427b3894af7969c67acc0f27ab5aa68b4279.1340665087.git.aquini@redhat.com>
> +#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
> +/*
> + * Balloon pages special page->mapping.
> + * users must properly allocate and initiliaze an instance of balloon_mapping,
initialize
> + * and set it as the page->mapping for balloon enlisted page instances.
> + *
> + * address_space_operations necessary methods for ballooned pages:
> + * .migratepage - used to perform balloon's page migration (as is)
> + * .invalidatepage - used to isolate a page from balloon's page list
> + * .freepage - used to reinsert an isolated page to balloon's page list
> + */
> +struct address_space *balloon_mapping;
> +EXPORT_SYMBOL(balloon_mapping);
Why don't you call this kvm_balloon_mapping - and when other balloon
drivers use it, then change it to something more generic. Also at that
future point the other balloon drivers might do it a bit differently so
it might be that will be reworked completly.
^ permalink raw reply
* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Rusty Russell @ 2012-06-27 2:56 UTC (permalink / raw)
To: Michael S. Tsirkin, Frank Swiderski
Cc: Andrea Arcangeli, riel, kvm, linux-kernel, mikew, virtualization
In-Reply-To: <20120626214106.GC14054@redhat.com>
On Wed, 27 Jun 2012 00:41:06 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Jun 26, 2012 at 01:32:58PM -0700, Frank Swiderski wrote:
> > This implementation of a virtio balloon driver uses the page cache to
> > "store" pages that have been released to the host. The communication
> > (outside of target counts) is one way--the guest notifies the host when
> > it adds a page to the page cache, allowing the host to madvise(2) with
> > MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit
> > (via the regular page reclaim). This means that inflating the balloon
> > is similar to the existing balloon mechanism, but the deflate is
> > different--it re-uses existing Linux kernel functionality to
> > automatically reclaim.
> >
> > Signed-off-by: Frank Swiderski <fes@google.com>
>
> I'm pondering this:
>
> Should it really be a separate driver/device ID?
> If it behaves the same from host POV, maybe it
> should be up to the guest how to inflate/deflate
> the balloon internally?
Well, it shouldn't steal ID 10, either way :) Either use a completely
bogus number, or ask for an id.
But AFAICT this should be a an alternate driver of for the same device:
it's not really a separate device, is it?
Cheers,
Rusty.
^ permalink raw reply
* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Michael S. Tsirkin @ 2012-06-27 9:02 UTC (permalink / raw)
To: Frank Swiderski
Cc: Andrea Arcangeli, Rik van Riel, Rafael Aquini, kvm, linux-kernel,
virtualization, Ying Han, mikew
In-Reply-To: <CAK+C7kUN-kYVK9AnEhcof98p+eZN1dkt9qVyYppETOeS2n3CMQ@mail.gmail.com>
On Tue, Jun 26, 2012 at 04:21:58PM -0700, Frank Swiderski wrote:
> On Tue, Jun 26, 2012 at 2:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Jun 26, 2012 at 02:31:26PM -0700, Frank Swiderski wrote:
> >> On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel <riel@redhat.com> wrote:
> >> > On 06/26/2012 04:32 PM, Frank Swiderski wrote:
> >> >>
> >> >> This implementation of a virtio balloon driver uses the page cache to
> >> >> "store" pages that have been released to the host. The communication
> >> >> (outside of target counts) is one way--the guest notifies the host when
> >> >> it adds a page to the page cache, allowing the host to madvise(2) with
> >> >> MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit
> >> >> (via the regular page reclaim). This means that inflating the balloon
> >> >> is similar to the existing balloon mechanism, but the deflate is
> >> >> different--it re-uses existing Linux kernel functionality to
> >> >> automatically reclaim.
> >> >>
> >> >> Signed-off-by: Frank Swiderski<fes@google.com>
> >> >
> >> >
> >> > It is a great idea, but how can this memory balancing
> >> > possibly work if someone uses memory cgroups inside a
> >> > guest?
> >>
> >> Thanks and good point--this isn't something that I considered in the
> >> implementation.
> >>
> >> > Having said that, we currently do not have proper
> >> > memory reclaim balancing between cgroups at all, so
> >> > requiring that of this balloon driver would be
> >> > unreasonable.
> >> >
> >> > The code looks good to me, my only worry is the
> >> > code duplication. We now have 5 balloon drivers,
> >> > for 4 hypervisors, all implementing everything
> >> > from scratch...
> >>
> >> Do you have any recommendations on this? I could (I think reasonably
> >> so) modify the existing virtio_balloon.c and have it change behavior
> >> based on a feature bit or other configuration. I'm not sure that
> >> really addresses the root of what you're pointing out--it's still
> >> adding a different implementation, but doing so as an extension of an
> >> existing one.
> >>
> >> fes
> >
> > Let's assume it's a feature bit: how would you
> > formulate what the feature does *from host point of view*?
> >
> > --
> > MST
>
> In this implementation, the host doesn't keep track of pages in the
> balloon, as there is no explicit deflate path. The host device for
> this implementation should merely, for example, MADV_DONTNEED on the
> pages sent in an inflate. Thus, the inflate becomes a notification
> that the guest doesn't need those pages mapped in, but that they
> should be available if the guest touches them.
So guest access removes the page from the balloon,
since it cancels MADV_DONTNEED, right?
Okay. But what is the meaning of num_pages then?
For example, let's assume I set num_pages to 1,
then guest gives me a page and later accesses this
page. Is guest also required to give me another
page now? Later I send a config interrupt without
changing num_pages. Is guest required to give me another
page now?
> In that sense, it's
> not a rigid shrink of guest memory. I'm not sure what I'd call the
> feature bit though.
>
> Was that the question you were asking, or did I misread?
>
> fes
Yes. It would be a good idea for you to try and write a spec IMO.
Send a patch to virtio.lyx
--
MST
^ permalink raw reply
* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Michael S. Tsirkin @ 2012-06-27 9:04 UTC (permalink / raw)
To: Frank Swiderski
Cc: Andrea Arcangeli, Rik van Riel, Rafael Aquini, kvm, linux-kernel,
virtualization, Ying Han, mikew
In-Reply-To: <CAK+C7kVHeUz7nUV1vtSHrK6vXorLsZoos82NFm0P0Ux0rEOZGQ@mail.gmail.com>
On Tue, Jun 26, 2012 at 04:45:36PM -0700, Frank Swiderski wrote:
> On Tue, Jun 26, 2012 at 2:45 PM, Rik van Riel <riel@redhat.com> wrote:
> > On 06/26/2012 05:31 PM, Frank Swiderski wrote:
> >>
> >> On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel<riel@redhat.com> wrote:
> >
> >
> >>> The code looks good to me, my only worry is the
> >>> code duplication. We now have 5 balloon drivers,
> >>> for 4 hypervisors, all implementing everything
> >>> from scratch...
> >>
> >>
> >> Do you have any recommendations on this? I could (I think reasonably
> >> so) modify the existing virtio_balloon.c and have it change behavior
> >> based on a feature bit or other configuration. I'm not sure that
> >> really addresses the root of what you're pointing out--it's still
> >> adding a different implementation, but doing so as an extension of an
> >> existing one.
> >
> >
> > Ideally, I believe we would have two balloon
> > top parts in a guest (one classical balloon,
> > one on the LRU), and four bottom parts (kvm,
> > xen, vmware & s390).
> >
> > That way the virt specific bits of a balloon
> > driver would be essentially a ->balloon_page
> > and ->release_page callback for pages, as well
> > as methods to communicate with the host.
> >
> > All the management of pages, including stuff
> > like putting them on the LRU, or isolating
> > them for migration, would be done with the
> > same common code, regardless of what virt
> > software we are running on.
> >
> > Of course, that is a substantial amount of
> > work and I feel it would be unreasonable to
> > block anyone's code on that kind of thing
> > (especially considering that your code is good),
> > but I do believe the explosion of balloon
> > code is a little worrying.
> >
>
> Hm, that makes a lot of sense. That would be a few patches definitely
> worth doing, IMHO. I'm not entirely sure how I feel about inflating
> the balloon drivers in the meantime. Sigh, and I didn't even mean
> that as a pun.
>
> fes
Actually I'm not 100% sure the num_pages interface
of the classical balloon is a good fit for the LRU
balloon. Let's figure that out first: if we fork the interface
there might not be all that much common code ...
--
MST
^ permalink raw reply
* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Amit Shah @ 2012-06-27 9:40 UTC (permalink / raw)
To: Frank Swiderski
Cc: Andrea Arcangeli, riel, kvm, Michael S. Tsirkin, linux-kernel,
mikew, virtualization
In-Reply-To: <1340742778-11282-1-git-send-email-fes@google.com>
On (Tue) 26 Jun 2012 [13:32:58], Frank Swiderski wrote:
> This implementation of a virtio balloon driver uses the page cache to
> "store" pages that have been released to the host. The communication
> (outside of target counts) is one way--the guest notifies the host when
> it adds a page to the page cache, allowing the host to madvise(2) with
> MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit
> (via the regular page reclaim). This means that inflating the balloon
> is similar to the existing balloon mechanism, but the deflate is
> different--it re-uses existing Linux kernel functionality to
> automatically reclaim.
This is a good idea for a guest co-operative balloon driver. I don't
think it'll replace the original driver. The traditional balloon
model is essentially driven by the host to increase guest density on
the host. This driver can't work in that case. However, using both
the types of drivers will be helpful, as unused pages on the guest
will be able to be used by the host.
Balbir Singh had done some work earlier on a guest co-operative
balloon driver, but AFAIR it was with modification of the existing
virtio-balloon driver.
I don't think a separate driver is necessary for the functionality,
though. Perhaps just a new config space item which mentions how many
pages are present in the page cache, so that host do some accounting
as well.
Amit
^ permalink raw reply
* Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Mel Gorman @ 2012-06-27 9:42 UTC (permalink / raw)
To: Andi Kleen
Cc: Rik van Riel, Rafael Aquini, Michael S. Tsirkin, linux-kernel,
virtualization, linux-mm
In-Reply-To: <20120626203400.GA11413@one.firstfloor.org>
On Tue, Jun 26, 2012 at 10:34:00PM +0200, Andi Kleen wrote:
> > How is the compiler meant to optimise away "cond" if it's a function
> > call?
>
> Inlines can be optimized away. These tests are usually inlines.
>
> > What did I miss? If nothing, then I will revert this particular change
> > and Rafael will need to be sure his patch is not using VM_BUG_ON to call
> > a function with side-effects.
>
> Do you have an example where the code is actually different,
> or are you just speculating?
>
> FWIW for my config both generates the same code:
>
> size vmlinux-andi-vmbug vmlinux-vmbug-nothing
> text data bss dec hex filename
> 11809704 1457352 1159168 14426224 dc2070 vmlinux-andi-vmbug
> 11809704 1457352 1159168 14426224 dc2070 vmlinux-vmbug-nothing
>
They are the same size because CONFIG_DEBUG_VM and !CONFIG_DEBUG_VM
generate the same code! I applied the patch below again 3.4 and got the
following results
text data bss dec hex filename
6918617 1795640 2260992 10975249 a77811 vmlinux-default-no-vmdebug
6916633 1795640 2260992 10973265 a77051 vmlinux-patched-no-vmdebug
That's almost 2K of text!
I see now that in 3.5 this was already spotted and fixed by Konstantin
in commit [02602a18: bug: completely remove code generated by disabled
VM_BUG_ON()]. That patch restores the rule that VM_BUG_ON() cannot call
anything with side-effects. So Rafael, watch your use of VM_BUG_ON or
you'll find that the your patches work in 3.4 and leak in 3.5.
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index c04ecfe..ee24ef8 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -4,7 +4,7 @@
#ifdef CONFIG_DEBUG_VM
#define VM_BUG_ON(cond) BUG_ON(cond)
#else
-#define VM_BUG_ON(cond) do { (void)(cond); } while (0)
+#define VM_BUG_ON(cond) do { } while (0)
#endif
#ifdef CONFIG_DEBUG_VIRTUAL
^ permalink raw reply related
* Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Rafael Aquini @ 2012-06-27 15:17 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Rik van Riel, Michael S. Tsirkin, linux-kernel, virtualization,
linux-mm
In-Reply-To: <20120626235754.GB14782@localhost.localdomain>
On Tue, Jun 26, 2012 at 07:57:55PM -0400, Konrad Rzeszutek Wilk wrote:
> > +#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
> > +/*
> > + * Balloon pages special page->mapping.
> > + * users must properly allocate and initiliaze an instance of balloon_mapping,
>
> initialize
>
Thanks! will fix it.
> > + * and set it as the page->mapping for balloon enlisted page instances.
> > + *
> > + * address_space_operations necessary methods for ballooned pages:
> > + * .migratepage - used to perform balloon's page migration (as is)
> > + * .invalidatepage - used to isolate a page from balloon's page list
> > + * .freepage - used to reinsert an isolated page to balloon's page list
> > + */
> > +struct address_space *balloon_mapping;
> > +EXPORT_SYMBOL(balloon_mapping);
>
> Why don't you call this kvm_balloon_mapping - and when other balloon
> drivers use it, then change it to something more generic. Also at that
> future point the other balloon drivers might do it a bit differently so
> it might be that will be reworked completly.
Ok, I see your point. However I really think it's better to keep the naming as
generic as possible today and, in the future, those who need to change it a bit can
do it with no pain at all. I believe this way we potentially prevent unnecessary code
duplication, as it will just be a matter of adjusting those preprocessor checking to
include other balloon driver to the scheme, or get rid of all of them (in case all
balloon drivers assume the very same technique for their page mobility primitives).
As I can be utterly wrong on this, lets see if other folks raise the same
concerns about this naming scheme I'm using here. If it ends up being a general
concern that it would be better not being generic at this point, I'll happily
switch my approach to whatever comes up to be the most feasible way of doing it.
Thanks a lot for taking such consideration and provide good feedback on this
work.
^ permalink raw reply
* [patch net-next] virtio_net: allow to change mac when iface is running
From: Jiri Pirko @ 2012-06-27 15:27 UTC (permalink / raw)
To: netdev; +Cc: brouer, virtualization, davem, mst
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/virtio_net.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f18149a..36a16d5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -679,11 +679,12 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
{
struct virtnet_info *vi = netdev_priv(dev);
struct virtio_device *vdev = vi->vdev;
- int ret;
+ struct sockaddr *addr = p;
- ret = eth_mac_addr(dev, p);
- if (ret)
- return ret;
+ if (!is_valid_ether_addr(addr->sa_data))
+ return -EADDRNOTAVAIL;
+ memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
+ dev->addr_assign_type &= ~NET_ADDR_RANDOM;
if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
--
1.7.10.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox