* [PATCH 0 of 6] [RFC] another attempt at making hotplug memory and xen play together
@ 2008-04-04 0:05 Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining Jeremy Fitzhardinge
` (5 more replies)
0 siblings, 6 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 0:05 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki, Yasunori Goto, Dave Hansen
Cc: Ingo Molnar, LKML, Christoph Lameter
Hi hotplug-memory people,
Here's my current set of WIP patches for using hotplug memory with the
Xen balloon driver. I'm including all the xen-balloon patches so you
have context for evaluating my hotplug-memory changes.
The gist of the changes to the hotplug memory subsystem is:
- split online_pages up so that users can grow the zones and actually
online the pages as separate operations,
- remove a bunch of duplicate definitions of online_page() (this is
just a generally useful cleanup, and it serves to make the later
change smaller)
- add a section_ops structure, which defines functions to online and
offline pages. By default this just calls the standard
online_page() function, but the xen-balloon driver can install its
own version which avoids onlining pages with no backing store.
The latter is my generalization of Kamezawa-san's suggestion to put a
hook in the x86-32 online_page() function. My problem with this idea
is 1) how should the Xen callback know whether it should do anything
special with a given page, and 2) what would happen if two people
wanted to use the hook? Generally, when we want different kinds of
the same object to have different behaviours, we add an ops structure
and fill it out appropriately, so that's what I decided to do.
Unfortunately threading the ops pointer through the callchain makes
the patch a little large, but not very complex.
Thanks,
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining
2008-04-04 0:05 [PATCH 0 of 6] [RFC] another attempt at making hotplug memory and xen play together Jeremy Fitzhardinge
@ 2008-04-04 0:05 ` Jeremy Fitzhardinge
2008-04-04 1:06 ` Dave Hansen
` (2 more replies)
2008-04-04 0:05 ` [PATCH 2 of 6] xen: make phys_to_machine structure dynamic Jeremy Fitzhardinge
` (4 subsequent siblings)
5 siblings, 3 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 0:05 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki, Yasunori Goto, Dave Hansen
Cc: Ingo Molnar, LKML, Christoph Lameter
The Xen balloon driver needs to separate the process of hot-installing
memory into two phases: one to allocate the page structures and
configure the zones, and another to actually online the pages of newly
installed memory.
This patch splits up the innards of online_pages() into two pieces
which correspond to these two phases. The behaviour of online_pages()
itself is unchanged.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
---
include/linux/memory_hotplug.h | 3 +
mm/memory_hotplug.c | 102 ++++++++++++++++++++++++++--------------
2 files changed, 71 insertions(+), 34 deletions(-)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -57,7 +57,10 @@
/* need some defines for these for archs that don't support it */
extern void online_page(struct page *page);
/* VM interface that may be used by firmware interface */
+extern int prepare_online_pages(unsigned long pfn, unsigned long nr_pages);
+extern unsigned long mark_pages_onlined(unsigned long pfn, unsigned long nr_pages);
extern int online_pages(unsigned long, unsigned long);
+
extern void __offline_isolated_pages(unsigned long, unsigned long);
extern int offline_pages(unsigned long, unsigned long, unsigned long);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -134,10 +134,32 @@
}
EXPORT_SYMBOL_GPL(__add_pages);
-static void grow_zone_span(struct zone *zone,
+static void grow_pgdat_span(struct pglist_data *pgdat,
unsigned long start_pfn, unsigned long end_pfn)
{
+ unsigned long old_pgdat_end_pfn =
+ pgdat->node_start_pfn + pgdat->node_spanned_pages;
+
+ if (start_pfn < pgdat->node_start_pfn)
+ pgdat->node_start_pfn = start_pfn;
+
+ pgdat->node_spanned_pages = max(old_pgdat_end_pfn, end_pfn) -
+ pgdat->node_start_pfn;
+}
+
+static void grow_zone_span(unsigned long start_pfn, unsigned long end_pfn)
+{
unsigned long old_zone_end_pfn;
+ struct zone *zone;
+ unsigned long flags;
+
+ /*
+ * This doesn't need a lock to do pfn_to_page().
+ * The section can't be removed here because of the
+ * memory_block->state_sem.
+ */
+ zone = page_zone(pfn_to_page(start_pfn));
+ pgdat_resize_lock(zone->zone_pgdat, &flags);
zone_span_writelock(zone);
@@ -149,19 +171,9 @@
zone->zone_start_pfn;
zone_span_writeunlock(zone);
-}
-static void grow_pgdat_span(struct pglist_data *pgdat,
- unsigned long start_pfn, unsigned long end_pfn)
-{
- unsigned long old_pgdat_end_pfn =
- pgdat->node_start_pfn + pgdat->node_spanned_pages;
-
- if (start_pfn < pgdat->node_start_pfn)
- pgdat->node_start_pfn = start_pfn;
-
- pgdat->node_spanned_pages = max(old_pgdat_end_pfn, end_pfn) -
- pgdat->node_start_pfn;
+ grow_pgdat_span(zone->zone_pgdat, start_pfn, end_pfn);
+ pgdat_resize_unlock(zone->zone_pgdat, &flags);
}
static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
@@ -180,16 +192,12 @@
return 0;
}
-
-int online_pages(unsigned long pfn, unsigned long nr_pages)
+/* Tell anyone who's interested that we're onlining some memory */
+static int notify_going_online(unsigned long pfn, unsigned long nr_pages)
{
- unsigned long flags;
- unsigned long onlined_pages = 0;
- struct zone *zone;
- int need_zonelists_rebuild = 0;
+ struct memory_notify arg;
int nid;
int ret;
- struct memory_notify arg;
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
@@ -201,20 +209,18 @@
ret = memory_notify(MEM_GOING_ONLINE, &arg);
ret = notifier_to_errno(ret);
- if (ret) {
+ if (ret)
memory_notify(MEM_CANCEL_ONLINE, &arg);
- return ret;
- }
- /*
- * This doesn't need a lock to do pfn_to_page().
- * The section can't be removed here because of the
- * memory_block->state_sem.
- */
- zone = page_zone(pfn_to_page(pfn));
- pgdat_resize_lock(zone->zone_pgdat, &flags);
- grow_zone_span(zone, pfn, pfn + nr_pages);
- grow_pgdat_span(zone->zone_pgdat, pfn, pfn + nr_pages);
- pgdat_resize_unlock(zone->zone_pgdat, &flags);
+
+ return ret;
+}
+
+/* Mark a set of pages as online */
+unsigned long mark_pages_onlined(unsigned long pfn, unsigned long nr_pages)
+{
+ struct zone *zone = page_zone(pfn_to_page(pfn));
+ unsigned long onlined_pages = 0;
+ int need_zonelists_rebuild = 0;
/*
* If this zone is not populated, then it is not in zonelist.
@@ -240,10 +246,38 @@
vm_total_pages = nr_free_pagecache_pages();
writeback_set_ratelimit();
- if (onlined_pages)
+ if (onlined_pages) {
+ struct memory_notify arg;
+
+ arg.start_pfn = pfn; /* ? */
+ arg.nr_pages = onlined_pages;
+ arg.status_change_nid = -1; /* ? */
+
memory_notify(MEM_ONLINE, &arg);
+ }
+ return onlined_pages;
+}
+
+int prepare_online_pages(unsigned long pfn, unsigned long nr_pages)
+{
+ int ret = notify_going_online(pfn, nr_pages);
+ if (ret)
+ return ret;
+
+ grow_zone_span(pfn, pfn+nr_pages);
return 0;
+}
+
+int online_pages(unsigned long pfn, unsigned long nr_pages)
+{
+ int ret;
+
+ ret = prepare_online_pages(pfn, nr_pages);
+ if (ret == 0)
+ mark_pages_onlined(pfn, nr_pages);
+
+ return ret;
}
#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 2 of 6] xen: make phys_to_machine structure dynamic
2008-04-04 0:05 [PATCH 0 of 6] [RFC] another attempt at making hotplug memory and xen play together Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining Jeremy Fitzhardinge
@ 2008-04-04 0:05 ` Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 3 of 6] xen-balloon: use memory hot-add to expand the domain's memory Jeremy Fitzhardinge
` (3 subsequent siblings)
5 siblings, 0 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 0:05 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki, Yasunori Goto, Dave Hansen
Cc: Ingo Molnar, LKML, Christoph Lameter
We now support the use of memory hotplug, so the physical to machine
page mapping structure must be dynamic. This is implemented as a
two-level radix tree structure, which allows us to efficiently
incrementally allocate memory for the p2m table as new pages are
added.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/xen/enlighten.c | 2 -
arch/x86/xen/mmu.c | 85 ++++++++++++++++++++++++++++++++++++++++++++
arch/x86/xen/setup.c | 2 -
arch/x86/xen/xen-ops.h | 2 +
include/asm-x86/xen/page.h | 20 +++-------
5 files changed, 94 insertions(+), 17 deletions(-)
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1207,7 +1207,7 @@
/* Get mfn list */
if (!xen_feature(XENFEAT_auto_translated_physmap))
- phys_to_machine_mapping = (unsigned long *)xen_start_info->mfn_list;
+ xen_build_dynamic_phys_to_machine();
pgd = (pgd_t *)xen_start_info->pt_base;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -55,6 +55,91 @@
#include "multicalls.h"
#include "mmu.h"
+
+/*
+ * This should probably be a config option. On 32-bit, it costs 1
+ * page/gig of memory; on 64-bit its 2 pages/gig. If we want it to be
+ * completely unbounded we can add another level to the p2m structure.
+ */
+#define MAX_GUEST_PAGES (16ull * 1024*1024*1024 / PAGE_SIZE)
+#define P2M_ENTRIES_PER_PAGE (PAGE_SIZE / sizeof(unsigned long))
+
+static unsigned long *p2m_top[MAX_GUEST_PAGES / P2M_ENTRIES_PER_PAGE];
+
+static inline unsigned p2m_top_index(unsigned long pfn)
+{
+ BUG_ON(pfn >= MAX_GUEST_PAGES);
+ return pfn / P2M_ENTRIES_PER_PAGE;
+}
+
+static inline unsigned p2m_index(unsigned long pfn)
+{
+ return pfn % P2M_ENTRIES_PER_PAGE;
+}
+
+void __init xen_build_dynamic_phys_to_machine(void)
+{
+ unsigned pfn;
+ unsigned long *mfn_list = (unsigned long *)xen_start_info->mfn_list;
+
+ BUG_ON(xen_start_info->nr_pages >= MAX_GUEST_PAGES);
+
+ for(pfn = 0;
+ pfn < xen_start_info->nr_pages;
+ pfn += P2M_ENTRIES_PER_PAGE) {
+ unsigned topidx = p2m_top_index(pfn);
+
+ p2m_top[topidx] = &mfn_list[pfn];
+ }
+}
+
+unsigned long get_phys_to_machine(unsigned long pfn)
+{
+ unsigned topidx, idx;
+
+ topidx = p2m_top_index(pfn);
+ if (p2m_top[topidx] == NULL)
+ return INVALID_P2M_ENTRY;
+
+ idx = p2m_index(pfn);
+ return p2m_top[topidx][idx];
+}
+
+static void alloc_p2m(unsigned long **pp)
+{
+ unsigned long *p;
+ unsigned i;
+
+ p = (void *)__get_free_page(GFP_KERNEL | __GFP_NOFAIL);
+ BUG_ON(p == NULL);
+
+ for(i = 0; i < P2M_ENTRIES_PER_PAGE; i++)
+ p[i] = INVALID_P2M_ENTRY;
+
+ if (cmpxchg(pp, NULL, p) != NULL)
+ free_page((unsigned long)p);
+}
+
+void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+ unsigned topidx, idx;
+
+ if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
+ BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
+ return;
+ }
+
+ topidx = p2m_top_index(pfn);
+ if (p2m_top[topidx] == NULL) {
+ /* no need to allocate a page to store an invalid entry */
+ if (mfn == INVALID_P2M_ENTRY)
+ return;
+ alloc_p2m(&p2m_top[topidx]);
+ }
+
+ idx = p2m_index(pfn);
+ p2m_top[topidx][idx] = mfn;
+}
xmaddr_t arbitrary_virt_to_machine(unsigned long address)
{
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -27,8 +27,6 @@
extern const char xen_hypervisor_callback[];
extern const char xen_failsafe_callback[];
-unsigned long *phys_to_machine_mapping;
-EXPORT_SYMBOL(phys_to_machine_mapping);
/**
* machine_specific_memory_setup - Hook for machine specific memory setup.
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -21,6 +21,8 @@
void __init xen_arch_setup(void);
void __init xen_init_IRQ(void);
void xen_enable_sysenter(void);
+
+void __init xen_build_dynamic_phys_to_machine(void);
void xen_setup_timer(int cpu);
void xen_setup_cpu_clockevents(void);
diff --git a/include/asm-x86/xen/page.h b/include/asm-x86/xen/page.h
--- a/include/asm-x86/xen/page.h
+++ b/include/asm-x86/xen/page.h
@@ -26,15 +26,15 @@
#define FOREIGN_FRAME_BIT (1UL<<31)
#define FOREIGN_FRAME(m) ((m) | FOREIGN_FRAME_BIT)
-extern unsigned long *phys_to_machine_mapping;
+extern unsigned long get_phys_to_machine(unsigned long pfn);
+extern void set_phys_to_machine(unsigned long pfn, unsigned long mfn);
static inline unsigned long pfn_to_mfn(unsigned long pfn)
{
if (xen_feature(XENFEAT_auto_translated_physmap))
return pfn;
- return phys_to_machine_mapping[(unsigned int)(pfn)] &
- ~FOREIGN_FRAME_BIT;
+ return get_phys_to_machine(pfn) & ~FOREIGN_FRAME_BIT;
}
static inline int phys_to_machine_mapping_valid(unsigned long pfn)
@@ -42,7 +42,7 @@
if (xen_feature(XENFEAT_auto_translated_physmap))
return 1;
- return (phys_to_machine_mapping[pfn] != INVALID_P2M_ENTRY);
+ return get_phys_to_machine(pfn) != INVALID_P2M_ENTRY;
}
static inline unsigned long mfn_to_pfn(unsigned long mfn)
@@ -106,18 +106,10 @@
unsigned long pfn = mfn_to_pfn(mfn);
if ((pfn < max_mapnr)
&& !xen_feature(XENFEAT_auto_translated_physmap)
- && (phys_to_machine_mapping[pfn] != mfn))
+ && (get_phys_to_machine(pfn) != mfn))
return max_mapnr; /* force !pfn_valid() */
+ /* XXX fixme; not true with sparsemem */
return pfn;
-}
-
-static inline void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
-{
- if (xen_feature(XENFEAT_auto_translated_physmap)) {
- BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
- return;
- }
- phys_to_machine_mapping[pfn] = mfn;
}
/* VIRT <-> MACHINE conversion */
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 3 of 6] xen-balloon: use memory hot-add to expand the domain's memory
2008-04-04 0:05 [PATCH 0 of 6] [RFC] another attempt at making hotplug memory and xen play together Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 2 of 6] xen: make phys_to_machine structure dynamic Jeremy Fitzhardinge
@ 2008-04-04 0:05 ` Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 4 of 6] hotplug-memory: use common online_page Jeremy Fitzhardinge
` (2 subsequent siblings)
5 siblings, 0 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 0:05 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki, Yasunori Goto, Dave Hansen
Cc: Ingo Molnar, LKML, Christoph Lameter
When expanding a domain beyond its initial size, we must introduce new
page structures to describe the newly added memory. In the past this has
been done by over-allocating the mem_map[] array to allow space to grow.
This has two disadvantages:
1 - it sets an upper limit at domain-creation time which cannot be
later changed, and
2 - it sets a lower limit on how small a domain can be shrunk, because
lowmem gets filled with page structures.
This patch uses a different approach. When the balloon driver wants to
expand the domain, it hot-adds new memory via the hotplug memory driver,
which will allocate new page structures for the memory. As balloon
driver gets backing memory from the hypervisor, it incrementally onlines
the pages.
Note that memory is never hot-unplugged, so growing a domain to a
large size then shrinking it again will still leave memory full of
page structures.
NB: the newly hotplugged memory will appear in sysfs under
/sys/device/system/memory/memoryX. This memory can be manually onlined by
"echo online > /sys/device/system/memory/memoryX/state". DO NOT DO THIS!
The "memory" has no underlying memory until the balloon driver puts it
there, and the domain will crash.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
---
drivers/xen/balloon.c | 100 ++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 91 insertions(+), 9 deletions(-)
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -40,6 +40,8 @@
#include <linux/bootmem.h>
#include <linux/pagemap.h>
#include <linux/highmem.h>
+#include <linux/ioport.h>
+#include <linux/memory_hotplug.h>
#include <linux/mutex.h>
#include <linux/highmem.h>
#include <linux/list.h>
@@ -60,7 +62,7 @@
#define PAGES2KB(_p) ((_p)<<(PAGE_SHIFT-10))
-#define BALLOON_CLASS_NAME "memory"
+#define BALLOON_CLASS_NAME "xen_memory"
struct balloon_stats {
/* We aim for 'current allocation' == 'target allocation'. */
@@ -137,6 +139,60 @@
}
}
+/* hotplug some memory we can add pages to */
+static void balloon_expand(unsigned pages)
+{
+#ifdef CONFIG_MEMORY_HOTPLUG
+ struct resource *res;
+ int ret;
+ u64 size = (u64)pages * PAGE_SIZE;
+ unsigned pfn;
+ unsigned start_pfn, end_pfn;
+
+ res = kzalloc(sizeof(*res), GFP_KERNEL);
+ if (!res)
+ return;
+
+ res->name = "Xen Balloon";
+ res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+
+ ret = allocate_resource(&iomem_resource, res, size, 0, -1,
+ 1ul << SECTION_SIZE_BITS, NULL, NULL);
+
+ if (ret)
+ goto free_res;
+
+ start_pfn = res->start >> PAGE_SHIFT;
+ end_pfn = (res->end + 1) >> PAGE_SHIFT;
+
+ ret = add_memory_resource(0, res);
+ if (ret)
+ goto release_res;
+
+ ret = prepare_online_pages(start_pfn, pages);
+ if (ret)
+ goto release_res;
+
+ for(pfn = start_pfn; pfn < end_pfn; pfn++) {
+ struct page *page = pfn_to_page(pfn);
+
+ SetPageReserved(page);
+ set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+ balloon_append(page);
+ }
+
+ return;
+
+ release_res:
+ release_resource(res);
+
+ free_res:
+ kfree(res);
+
+ printk(KERN_INFO "balloon_expand failed: ret = %d\n", ret);
+#endif /* CONFIG_MEMORY_HOTPLUG */
+}
+
/* balloon_retrieve: rescue a page from the balloon, if it is not empty. */
static struct page *balloon_retrieve(void)
{
@@ -178,14 +234,23 @@
schedule_work(&balloon_worker);
}
+static unsigned long balloon_remains(void)
+{
+ return balloon_stats.balloon_low + balloon_stats.balloon_high;
+}
+
static unsigned long current_target(void)
{
- unsigned long target = min(balloon_stats.target_pages, balloon_stats.hard_limit);
+ unsigned long target;
+ target = min(balloon_stats.target_pages, balloon_stats.hard_limit);
+
+#ifndef CONFIG_MEMORY_HOTPLUG
target = min(target,
balloon_stats.current_pages +
balloon_stats.balloon_low +
balloon_stats.balloon_high);
+#endif
return target;
}
@@ -241,20 +306,29 @@
set_phys_to_machine(pfn, frame_list[i]);
+ /* Relinquish the page back to the allocator. */
+ mark_pages_onlined(pfn, 1);
+
/* Link back into the page tables if not highmem. */
- if (pfn < max_low_pfn) {
+ if (!PageHighMem(page)) {
int ret;
ret = HYPERVISOR_update_va_mapping(
(unsigned long)__va(pfn << PAGE_SHIFT),
mfn_pte(frame_list[i], PAGE_KERNEL),
0);
+
+ if (ret) {
+ struct zone *zone = page_zone(page);
+
+ printk("failed to map pfn %lu max_low_pfn %lu "
+ "addr %p mfn %lx; ret=%d; "
+ "page->flags=%lx, zone=%d\n",
+ pfn, max_low_pfn, __va(pfn << PAGE_SHIFT),
+ frame_list[i], ret, page->flags,
+ zone - zone->zone_pgdat->node_zones);
+ }
BUG_ON(ret);
}
-
- /* Relinquish the page back to the allocator. */
- ClearPageReserved(page);
- init_page_count(page);
- __free_page(page);
}
balloon_stats.current_pages += nr_pages;
@@ -282,6 +356,8 @@
for (i = 0; i < nr_pages; i++) {
if ((page = alloc_page(GFP_BALLOON)) == NULL) {
+ printk("managed to get %d/%d pages\n",
+ i, nr_pages);
nr_pages = i;
need_sleep = 1;
break;
@@ -330,8 +406,13 @@
do {
credit = current_target() - balloon_stats.current_pages;
- if (credit > 0)
+ if (credit > 0) {
+ if (credit > balloon_remains()) {
+ /* Give ourselves some pages to balloon into */
+ balloon_expand(PAGES_PER_SECTION);
+ }
need_sleep = (increase_reservation(credit) != 0);
+ }
if (credit < 0)
need_sleep = (decrease_reservation(-credit) != 0);
@@ -426,6 +507,7 @@
if (!PageReserved(page))
balloon_append(page);
}
+
target_watch.callback = watch_target;
xenstore_notifier.notifier_call = balloon_init_watcher;
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 4 of 6] hotplug-memory: use common online_page
2008-04-04 0:05 [PATCH 0 of 6] [RFC] another attempt at making hotplug memory and xen play together Jeremy Fitzhardinge
` (2 preceding siblings ...)
2008-04-04 0:05 ` [PATCH 3 of 6] xen-balloon: use memory hot-add to expand the domain's memory Jeremy Fitzhardinge
@ 2008-04-04 0:05 ` Jeremy Fitzhardinge
2008-04-04 0:47 ` Dave Hansen
2008-04-04 0:05 ` [PATCH 5 of 6] hotplug-memory: add section_ops Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 6 of 6] xen-balloon: define a section_ops Jeremy Fitzhardinge
5 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 0:05 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki, Yasunori Goto, Dave Hansen
Cc: Ingo Molnar, LKML, Christoph Lameter
Apart from x86-32, all architectures use an identical online_page.
Define it weakly in mm/memory_hotplug.c so that architectures can
override it if necessary.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/ia64/mm/init.c | 9 ---------
arch/powerpc/mm/mem.c | 9 ---------
arch/sh/mm/init.c | 9 ---------
arch/x86/mm/init_64.c | 9 ---------
mm/memory_hotplug.c | 9 +++++++++
5 files changed, 9 insertions(+), 36 deletions(-)
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -690,15 +690,6 @@
}
#ifdef CONFIG_MEMORY_HOTPLUG
-void online_page(struct page *page)
-{
- ClearPageReserved(page);
- init_page_count(page);
- __free_page(page);
- totalram_pages++;
- num_physpages++;
-}
-
int arch_add_memory(int nid, u64 start, u64 size)
{
pg_data_t *pgdat;
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -94,15 +94,6 @@
EXPORT_SYMBOL(phys_mem_access_prot);
#ifdef CONFIG_MEMORY_HOTPLUG
-
-void online_page(struct page *page)
-{
- ClearPageReserved(page);
- init_page_count(page);
- __free_page(page);
- totalram_pages++;
- num_physpages++;
-}
#ifdef CONFIG_NUMA
int memory_add_physaddr_to_nid(u64 start)
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -307,15 +307,6 @@
#endif
#ifdef CONFIG_MEMORY_HOTPLUG
-void online_page(struct page *page)
-{
- ClearPageReserved(page);
- init_page_count(page);
- __free_page(page);
- totalram_pages++;
- num_physpages++;
-}
-
int arch_add_memory(int nid, u64 start, u64 size)
{
pg_data_t *pgdat;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -624,15 +624,6 @@
/*
* Memory hotplug specific functions
*/
-void online_page(struct page *page)
-{
- ClearPageReserved(page);
- init_page_count(page);
- __free_page(page);
- totalram_pages++;
- num_physpages++;
-}
-
#ifdef CONFIG_MEMORY_HOTPLUG
/*
* Memory is added always to NORMAL zone. This means you will never get
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -145,6 +145,15 @@
pgdat->node_spanned_pages = max(old_pgdat_end_pfn, end_pfn) -
pgdat->node_start_pfn;
+}
+
+__weak void online_page(struct page *page)
+{
+ ClearPageReserved(page);
+ init_page_count(page);
+ __free_page(page);
+ totalram_pages++;
+ num_physpages++;
}
static void grow_zone_span(unsigned long start_pfn, unsigned long end_pfn)
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 5 of 6] hotplug-memory: add section_ops
2008-04-04 0:05 [PATCH 0 of 6] [RFC] another attempt at making hotplug memory and xen play together Jeremy Fitzhardinge
` (3 preceding siblings ...)
2008-04-04 0:05 ` [PATCH 4 of 6] hotplug-memory: use common online_page Jeremy Fitzhardinge
@ 2008-04-04 0:05 ` Jeremy Fitzhardinge
2008-04-04 0:51 ` Dave Hansen
2008-04-04 0:05 ` [PATCH 6 of 6] xen-balloon: define a section_ops Jeremy Fitzhardinge
5 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 0:05 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki, Yasunori Goto, Dave Hansen
Cc: Ingo Molnar, LKML, Christoph Lameter
Add a per-section "section_ops" structure, allowing each section to have
specific functions for onlining and offlining pages within the section.
This is used by the Xen balloon driver to make sure that pages are not
onlined without some physical memory backing them.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/ia64/mm/init.c | 4 ++--
arch/powerpc/mm/mem.c | 5 +++--
arch/sh/mm/init.c | 4 ++--
arch/x86/mm/init_32.c | 7 ++++---
arch/x86/mm/init_64.c | 4 ++--
drivers/xen/balloon.c | 2 +-
include/linux/memory_hotplug.h | 18 +++++++++++++-----
include/linux/mmzone.h | 2 ++
mm/memory_hotplug.c | 32 +++++++++++++++++++++-----------
mm/sparse.c | 3 ++-
10 files changed, 52 insertions(+), 29 deletions(-)
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -690,7 +690,7 @@
}
#ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size)
+int arch_add_memory(int nid, u64 start, u64 size, const struct section_ops *ops)
{
pg_data_t *pgdat;
struct zone *zone;
@@ -701,7 +701,7 @@
pgdat = NODE_DATA(nid);
zone = pgdat->node_zones + ZONE_NORMAL;
- ret = __add_pages(zone, start_pfn, nr_pages);
+ ret = __add_pages(zone, start_pfn, nr_pages, ops);
if (ret)
printk("%s: Problem encountered in __add_pages() as ret=%d\n",
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -102,7 +102,8 @@
}
#endif
-int __devinit arch_add_memory(int nid, u64 start, u64 size)
+int __devinit arch_add_memory(int nid, u64 start, u64 size,
+ const struct section_ops *ops)
{
struct pglist_data *pgdata;
struct zone *zone;
@@ -117,7 +118,7 @@
/* this should work for most non-highmem platforms */
zone = pgdata->node_zones;
- return __add_pages(zone, start_pfn, nr_pages);
+ return __add_pages(zone, start_pfn, nr_pages, ops);
}
#ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -307,7 +307,7 @@
#endif
#ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size)
+int arch_add_memory(int nid, u64 start, u64 size, const struct section_ops *ops)
{
pg_data_t *pgdat;
unsigned long start_pfn = start >> PAGE_SHIFT;
@@ -317,7 +317,7 @@
pgdat = NODE_DATA(nid);
/* We only have ZONE_NORMAL, so this is easy.. */
- ret = __add_pages(pgdat->node_zones + ZONE_NORMAL, start_pfn, nr_pages);
+ ret = __add_pages(pgdat->node_zones + ZONE_NORMAL, start_pfn, nr_pages, ops);
if (unlikely(ret))
printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -322,10 +322,11 @@
* has been added dynamically that would be
* onlined here is in HIGHMEM.
*/
-void __meminit online_page(struct page *page)
+bool __meminit online_page(struct page *page)
{
ClearPageReserved(page);
add_one_highpage_hotplug(page, page_to_pfn(page));
+ return true;
}
#ifndef CONFIG_NUMA
@@ -706,14 +707,14 @@
}
#ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size)
+int arch_add_memory(int nid, u64 start, u64 size, const struct section_ops *ops)
{
struct pglist_data *pgdata = NODE_DATA(nid);
struct zone *zone = pgdata->node_zones + ZONE_HIGHMEM;
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
- return __add_pages(zone, start_pfn, nr_pages);
+ return __add_pages(zone, start_pfn, nr_pages, ops);
}
#endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -629,7 +629,7 @@
* Memory is added always to NORMAL zone. This means you will never get
* additional DMA/DMA32 memory.
*/
-int arch_add_memory(int nid, u64 start, u64 size)
+int arch_add_memory(int nid, u64 start, u64 size, const struct section_ops *ops)
{
struct pglist_data *pgdat = NODE_DATA(nid);
struct zone *zone = pgdat->node_zones + ZONE_NORMAL;
@@ -641,7 +641,7 @@
if (last_mapped_pfn > max_pfn_mapped)
max_pfn_mapped = last_mapped_pfn;
- ret = __add_pages(zone, start_pfn, nr_pages);
+ ret = __add_pages(zone, start_pfn, nr_pages, ops);
WARN_ON(1);
return ret;
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -165,7 +165,7 @@
start_pfn = res->start >> PAGE_SHIFT;
end_pfn = (res->end + 1) >> PAGE_SHIFT;
- ret = add_memory_resource(0, res);
+ ret = add_memory_resource(0, res, &default_section_ops);
if (ret)
goto release_res;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -8,6 +8,10 @@
struct page;
struct zone;
struct pglist_data;
+struct section_ops {
+ bool (*online_page)(struct page *);
+ bool (*offline_page)(struct page *);
+};
#ifdef CONFIG_MEMORY_HOTPLUG
/*
@@ -55,7 +59,7 @@
extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages);
extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
/* need some defines for these for archs that don't support it */
-extern void online_page(struct page *page);
+extern bool online_page(struct page *page);
/* VM interface that may be used by firmware interface */
extern int prepare_online_pages(unsigned long pfn, unsigned long nr_pages);
extern unsigned long mark_pages_onlined(unsigned long pfn, unsigned long nr_pages);
@@ -66,7 +70,7 @@
/* reasonably generic interface to expand the physical pages in a zone */
extern int __add_pages(struct zone *zone, unsigned long start_pfn,
- unsigned long nr_pages);
+ unsigned long nr_pages, const struct section_ops *ops);
/*
* Walk thorugh all memory which is registered as resource.
@@ -176,11 +180,15 @@
struct resource;
+extern const struct section_ops default_section_ops;
+
extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *res);
-extern int arch_add_memory(int nid, u64 start, u64 size);
+extern int add_memory_resource(int nid, struct resource *res,
+ const struct section_ops *ops);
+extern int arch_add_memory(int nid, u64 start, u64 size,
+ const struct section_ops *ops);
extern int remove_memory(u64 start, u64 size);
extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
- int nr_pages);
+ int nr_pages, const struct section_ops *ops);
#endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -809,6 +809,8 @@
/* See declaration of similar field in struct zone */
unsigned long *pageblock_flags;
+
+ const struct section_ops *ops;
};
#ifdef CONFIG_SPARSEMEM_EXTREME
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -80,7 +80,8 @@
return 0;
}
-static int __add_section(struct zone *zone, unsigned long phys_start_pfn)
+static int __add_section(struct zone *zone, unsigned long phys_start_pfn,
+ const struct section_ops *ops)
{
int nr_pages = PAGES_PER_SECTION;
int ret;
@@ -88,7 +89,7 @@
if (pfn_valid(phys_start_pfn))
return -EEXIST;
- ret = sparse_add_one_section(zone, phys_start_pfn, nr_pages);
+ ret = sparse_add_one_section(zone, phys_start_pfn, nr_pages, ops);
if (ret < 0)
return ret;
@@ -108,7 +109,7 @@
* add the new pages.
*/
int __add_pages(struct zone *zone, unsigned long phys_start_pfn,
- unsigned long nr_pages)
+ unsigned long nr_pages, const struct section_ops *ops)
{
unsigned long i;
int err = 0;
@@ -118,7 +119,7 @@
end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
for (i = start_sec; i <= end_sec; i++) {
- err = __add_section(zone, i << PFN_SECTION_SHIFT);
+ err = __add_section(zone, i << PFN_SECTION_SHIFT, ops);
/*
* EEXIST is finally dealt with by ioresource collision
@@ -147,13 +148,14 @@
pgdat->node_start_pfn;
}
-__weak void online_page(struct page *page)
+__weak bool online_page(struct page *page)
{
ClearPageReserved(page);
init_page_count(page);
__free_page(page);
totalram_pages++;
num_physpages++;
+ return true;
}
static void grow_zone_span(unsigned long start_pfn, unsigned long end_pfn)
@@ -193,9 +195,13 @@
struct page *page;
if (PageReserved(pfn_to_page(start_pfn)))
for (i = 0; i < nr_pages; i++) {
- page = pfn_to_page(start_pfn + i);
- online_page(page);
- onlined_pages++;
+ unsigned long pfn = start_pfn + i;
+ struct mem_section *ms = __pfn_to_section(pfn);
+
+ page = pfn_to_page(pfn);
+
+ if (ms->ops->online_page(page))
+ onlined_pages++;
}
*(unsigned long *)arg = onlined_pages;
return 0;
@@ -318,6 +324,10 @@
return;
}
+const struct section_ops default_section_ops = {
+ .online_page = online_page,
+ .offline_page = NULL,
+};
int add_memory(int nid, u64 start, u64 size)
{
@@ -328,7 +338,7 @@
if (!res)
return -EEXIST;
- ret = add_memory_resource(nid, res);
+ ret = add_memory_resource(nid, res, &default_section_ops);
if (ret)
release_memory_resource(res);
@@ -336,7 +346,7 @@
return ret;
}
-int add_memory_resource(int nid, struct resource *res)
+int add_memory_resource(int nid, struct resource *res, const struct section_ops *ops)
{
pg_data_t *pgdat = NULL;
int new_pgdat = 0;
@@ -352,7 +362,7 @@
}
/* call arch's memory hotadd */
- ret = arch_add_memory(nid, start, size);
+ ret = arch_add_memory(nid, start, size, ops);
if (ret < 0)
goto error;
diff --git a/mm/sparse.c b/mm/sparse.c
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -393,7 +393,7 @@
* map was not consumed and must be freed.
*/
int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
- int nr_pages)
+ int nr_pages, const struct section_ops *ops)
{
unsigned long section_nr = pfn_to_section_nr(start_pfn);
struct pglist_data *pgdat = zone->zone_pgdat;
@@ -428,6 +428,7 @@
}
ms->section_mem_map |= SECTION_MARKED_PRESENT;
+ ms->ops = ops;
ret = sparse_init_one_section(ms, section_nr, memmap, usemap);
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 6 of 6] xen-balloon: define a section_ops
2008-04-04 0:05 [PATCH 0 of 6] [RFC] another attempt at making hotplug memory and xen play together Jeremy Fitzhardinge
` (4 preceding siblings ...)
2008-04-04 0:05 ` [PATCH 5 of 6] hotplug-memory: add section_ops Jeremy Fitzhardinge
@ 2008-04-04 0:05 ` Jeremy Fitzhardinge
5 siblings, 0 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 0:05 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki, Yasunori Goto, Dave Hansen
Cc: Ingo Molnar, LKML, Christoph Lameter
Define a xen_balloon_section_ops to control how xen balloon pages are onlined.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
drivers/xen/balloon.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -139,6 +139,21 @@
}
}
+/* Only online a page if there's some memory backing it */
+static bool xen_online_page(struct page *page)
+{
+ unsigned long pfn = page_to_pfn(page);
+
+ if (get_phys_to_machine(pfn) == INVALID_P2M_ENTRY)
+ return false;
+
+ return online_page(page);
+}
+
+static const struct section_ops xen_balloon_section_ops = {
+ .online_page = xen_online_page
+};
+
/* hotplug some memory we can add pages to */
static void balloon_expand(unsigned pages)
{
@@ -165,7 +180,7 @@
start_pfn = res->start >> PAGE_SHIFT;
end_pfn = (res->end + 1) >> PAGE_SHIFT;
- ret = add_memory_resource(0, res, &default_section_ops);
+ ret = add_memory_resource(0, res, &xen_balloon_section_ops);
if (ret)
goto release_res;
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4 of 6] hotplug-memory: use common online_page
2008-04-04 0:05 ` [PATCH 4 of 6] hotplug-memory: use common online_page Jeremy Fitzhardinge
@ 2008-04-04 0:47 ` Dave Hansen
2008-04-04 0:56 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2008-04-04 0:47 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
> Apart from x86-32, all architectures use an identical online_page.
> Define it weakly in mm/memory_hotplug.c so that architectures can
> override it if necessary.
This is OK, but I do think the __weak stuff can make it a bit
counter-intuitive to track down which version is actually getting used.
Are there any people the currently override that __weak?
If not, we should probably just kill it.
-- Dave
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5 of 6] hotplug-memory: add section_ops
2008-04-04 0:05 ` [PATCH 5 of 6] hotplug-memory: add section_ops Jeremy Fitzhardinge
@ 2008-04-04 0:51 ` Dave Hansen
2008-04-04 1:12 ` Jeremy Fitzhardinge
2008-04-04 1:47 ` KAMEZAWA Hiroyuki
0 siblings, 2 replies; 30+ messages in thread
From: Dave Hansen @ 2008-04-04 0:51 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
> Add a per-section "section_ops" structure, allowing each section to have
> specific functions for onlining and offlining pages within the section.
> This is used by the Xen balloon driver to make sure that pages are not
> onlined without some physical memory backing them.
This is kinda a lot of code and mucking around for what we actually get
out of it, especially since you just condensed down all the actual
online_page() instances.
I think it might just be nicer to have a global list of these handlers
somewhere. The Xen driver can just say "put me on the list of
callbacks" and we'll call them at online_page(). I really don't think
we need to be passing an ops structure around.
KAME, did you have some other ideas about this?
-- Dave
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4 of 6] hotplug-memory: use common online_page
2008-04-04 0:47 ` Dave Hansen
@ 2008-04-04 0:56 ` Jeremy Fitzhardinge
2008-04-04 1:00 ` Dave Hansen
0 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 0:56 UTC (permalink / raw)
To: Dave Hansen
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
Dave Hansen wrote:
> This is OK, but I do think the __weak stuff can make it a bit
> counter-intuitive to track down which version is actually getting used.
> Are there any people the currently override that __weak?
>
> If not, we should probably just kill it.
>
Yes, x86-32 has a different implementation.
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4 of 6] hotplug-memory: use common online_page
2008-04-04 0:56 ` Jeremy Fitzhardinge
@ 2008-04-04 1:00 ` Dave Hansen
2008-04-04 1:11 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2008-04-04 1:00 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
On Thu, 2008-04-03 at 17:56 -0700, Jeremy Fitzhardinge wrote:
> Dave Hansen wrote:
> > This is OK, but I do think the __weak stuff can make it a bit
> > counter-intuitive to track down which version is actually getting used.
> > Are there any people the currently override that __weak?
> >
> > If not, we should probably just kill it.
>
> Yes, x86-32 has a different implementation.
I guess we could just make the copies conditional on CONFIG_HIGHMEM and
define them both next to each other. It's really a highmem thing more
than a per-arch thing.
-- Dave
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining
2008-04-04 0:05 ` [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining Jeremy Fitzhardinge
@ 2008-04-04 1:06 ` Dave Hansen
2008-04-04 1:20 ` Jeremy Fitzhardinge
2008-04-04 1:09 ` Dave Hansen
2008-04-04 1:56 ` Yasunori Goto
2 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2008-04-04 1:06 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
>
> +static void grow_zone_span(unsigned long start_pfn, unsigned long
> end_pfn)
> +{
> unsigned long old_zone_end_pfn;
> + struct zone *zone;
> + unsigned long flags;
> +
> + /*
> + * This doesn't need a lock to do pfn_to_page().
> + * The section can't be removed here because of the
> + * memory_block->state_sem.
> + */
> + zone = page_zone(pfn_to_page(start_pfn));
> + pgdat_resize_lock(zone->zone_pgdat, &flags);
>
> zone_span_writelock(zone);
>
> @@ -149,19 +171,9 @@
> zone->zone_start_pfn;
>
> zone_span_writeunlock(zone);
> -}
>
> -static void grow_pgdat_span(struct pglist_data *pgdat,
> - unsigned long start_pfn, unsigned long end_pfn)
> -{
> - unsigned long old_pgdat_end_pfn =
> - pgdat->node_start_pfn + pgdat->node_spanned_pages;
> -
> - if (start_pfn < pgdat->node_start_pfn)
> - pgdat->node_start_pfn = start_pfn;
> -
> - pgdat->node_spanned_pages = max(old_pgdat_end_pfn, end_pfn) -
> - pgdat->node_start_pfn;
> + grow_pgdat_span(zone->zone_pgdat, start_pfn, end_pfn);
> + pgdat_resize_unlock(zone->zone_pgdat, &flags);
> }
>
> static int online_pages_range(unsigned long start_pfn, unsigned long
> nr_pages,
> @@ -180,16 +192,12 @@
> return 0;
> }
I don't particularly like this change to have grow_pgdat_span() called
*from* grow_zone_span(). Seems backwards to me. But, this diff look
funny (not your fault) so I may just be seeing things. :)
-- Dave
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining
2008-04-04 0:05 ` [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining Jeremy Fitzhardinge
2008-04-04 1:06 ` Dave Hansen
@ 2008-04-04 1:09 ` Dave Hansen
2008-04-04 1:32 ` Jeremy Fitzhardinge
2008-04-04 1:56 ` Yasunori Goto
2 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2008-04-04 1:09 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
>
> +int prepare_online_pages(unsigned long pfn, unsigned long nr_pages)
> +{
> + int ret = notify_going_online(pfn, nr_pages);
> + if (ret)
> + return ret;
> +
> + grow_zone_span(pfn, pfn+nr_pages);
> return 0;
> +}
OK, after seeing this used in the Xen driver, I'm even less a fan of the
name. Mostly because it doesn't actually prepare *pages* to be onlined.
It prepares the zones/pgdats and notifies that pages might soon be
online. Can you think of any better names?
grow_and_notify...??
It's also a bit funky because you're calling the online notifiers, but
you're not actually onlining the pages, yet. Does that have any
repercussions?
-- Dave
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4 of 6] hotplug-memory: use common online_page
2008-04-04 1:00 ` Dave Hansen
@ 2008-04-04 1:11 ` Jeremy Fitzhardinge
2008-04-04 1:22 ` Dave Hansen
0 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 1:11 UTC (permalink / raw)
To: Dave Hansen
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
Dave Hansen wrote:
> I guess we could just make the copies conditional on CONFIG_HIGHMEM and
> define them both next to each other. It's really a highmem thing more
> than a per-arch thing.
Sounds good to me. On that subject, how different are the various
arch_add_memory() functions really? There seems to be a lot of common
stuff in there.
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5 of 6] hotplug-memory: add section_ops
2008-04-04 0:51 ` Dave Hansen
@ 2008-04-04 1:12 ` Jeremy Fitzhardinge
2008-04-04 1:52 ` Dave Hansen
2008-04-04 1:47 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 1:12 UTC (permalink / raw)
To: Dave Hansen
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
Dave Hansen wrote:
> On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
>
>> Add a per-section "section_ops" structure, allowing each section to have
>> specific functions for onlining and offlining pages within the section.
>> This is used by the Xen balloon driver to make sure that pages are not
>> onlined without some physical memory backing them.
>>
>
> This is kinda a lot of code and mucking around for what we actually get
> out of it, especially since you just condensed down all the actual
> online_page() instances.
>
I agree, but it seemed like the most straightforward and architecturally
cleanest approach to me.
> I think it might just be nicer to have a global list of these handlers
> somewhere. The Xen driver can just say "put me on the list of
> callbacks" and we'll call them at online_page(). I really don't think
> we need to be passing an ops structure around.
>
Yes, but it seems a bit awkward. If we assume that:
1. Xen will be the only user of the hook, and
2. Xen-balloon hotplug is exclusive of real memory hotplug
then I guess its reasonable (though if that's the case it would be
simpler to just put a direct call under #ifdef CONFIG_XEN_BALLOON in there).
But if we think there can be multiple callbacks, and they all get called
on the online of each page, and there can be multiple kinds of hotplug
memory it gets pretty messy. Each has to determine "why was I called on
this page?" and you'd to work out which one actually does the job of
onlining. It just seems cleaner to say "this section needs to be
onlined like this", and there's no ambiguity.
I'm already anticipating using the ops mechanism to support another
class of Xen hotplug memory for managing large pages.
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining
2008-04-04 1:06 ` Dave Hansen
@ 2008-04-04 1:20 ` Jeremy Fitzhardinge
2008-04-04 1:33 ` Dave Hansen
0 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 1:20 UTC (permalink / raw)
To: Dave Hansen
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
Dave Hansen wrote:
> On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
>
>> +static void grow_zone_span(unsigned long start_pfn, unsigned long
>> end_pfn)
>> +{
>> unsigned long old_zone_end_pfn;
>> + struct zone *zone;
>> + unsigned long flags;
>> +
>> + /*
>> + * This doesn't need a lock to do pfn_to_page().
>> + * The section can't be removed here because of the
>> + * memory_block->state_sem.
>> + */
>> + zone = page_zone(pfn_to_page(start_pfn));
>> + pgdat_resize_lock(zone->zone_pgdat, &flags);
>>
>> zone_span_writelock(zone);
>>
>> @@ -149,19 +171,9 @@
>> zone->zone_start_pfn;
>>
>> zone_span_writeunlock(zone);
>> -}
>>
>> -static void grow_pgdat_span(struct pglist_data *pgdat,
>> - unsigned long start_pfn, unsigned long end_pfn)
>> -{
>> - unsigned long old_pgdat_end_pfn =
>> - pgdat->node_start_pfn + pgdat->node_spanned_pages;
>> -
>> - if (start_pfn < pgdat->node_start_pfn)
>> - pgdat->node_start_pfn = start_pfn;
>> -
>> - pgdat->node_spanned_pages = max(old_pgdat_end_pfn, end_pfn) -
>> - pgdat->node_start_pfn;
>> + grow_pgdat_span(zone->zone_pgdat, start_pfn, end_pfn);
>> + pgdat_resize_unlock(zone->zone_pgdat, &flags);
>> }
>>
>> static int online_pages_range(unsigned long start_pfn, unsigned long
>> nr_pages,
>> @@ -180,16 +192,12 @@
>> return 0;
>> }
>>
>
> I don't particularly like this change to have grow_pgdat_span() called
> *from* grow_zone_span(). Seems backwards to me. But, this diff look
> funny (not your fault) so I may just be seeing things. :)
>
Last time I posted this patch you complained about my name
"online_pages_zone", suggesting "grow_zone_span". Since that already
existed, I took that as a hint to fold the two functions together. Is
that not what you meant?
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4 of 6] hotplug-memory: use common online_page
2008-04-04 1:11 ` Jeremy Fitzhardinge
@ 2008-04-04 1:22 ` Dave Hansen
0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2008-04-04 1:22 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
On Thu, 2008-04-03 at 18:11 -0700, Jeremy Fitzhardinge wrote:
> Dave Hansen wrote:
> > I guess we could just make the copies conditional on CONFIG_HIGHMEM and
> > define them both next to each other. It's really a highmem thing more
> > than a per-arch thing.
>
> Sounds good to me. On that subject, how different are the various
> arch_add_memory() functions really? There seems to be a lot of common
> stuff in there.
Not to different last I looked. But, they're small enough (like
online_page()) that we didn't really care. We waited for someone like
you to come along and care enough to fix it up. :)
-- Dave
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining
2008-04-04 1:09 ` Dave Hansen
@ 2008-04-04 1:32 ` Jeremy Fitzhardinge
2008-04-04 1:41 ` Dave Hansen
0 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 1:32 UTC (permalink / raw)
To: Dave Hansen
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
Dave Hansen wrote:
> On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
>
>> +int prepare_online_pages(unsigned long pfn, unsigned long nr_pages)
>> +{
>> + int ret = notify_going_online(pfn, nr_pages);
>> + if (ret)
>> + return ret;
>> +
>> + grow_zone_span(pfn, pfn+nr_pages);
>> return 0;
>> +}
>>
>
> OK, after seeing this used in the Xen driver, I'm even less a fan of the
> name. Mostly because it doesn't actually prepare *pages* to be onlined.
> It prepares the zones/pgdats and notifies that pages might soon be
> online. Can you think of any better names?
>
> grow_and_notify...??
>
Does it even need to be a separately visible function? Could it just be
part of add_memory()? Is there any reason delay doing it until
online_pages()?
That way online_pages() can return to being the one-stop function to
online all the pages, making mark_pages_online() redundant.
> It's also a bit funky because you're calling the online notifiers, but
> you're not actually onlining the pages, yet. Does that have any
> repercussions?
No. It will always call the GOING_ONLINE notifier, but it will only
call the ONLINE notifier if it actually bulk-onlines all the pages. In
my page-by-page case, it will never end up calling the ONLINE notifier.
I could call it repeatedly for each page, but I'm not sure how useful
that is (the lack of any users of the ONLINE notifier makes it hard to
judge).
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining
2008-04-04 1:20 ` Jeremy Fitzhardinge
@ 2008-04-04 1:33 ` Dave Hansen
0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2008-04-04 1:33 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
On Thu, 2008-04-03 at 18:20 -0700, Jeremy Fitzhardinge wrote:
> Last time I posted this patch you complained about my name
> "online_pages_zone", suggesting "grow_zone_span". Since that already
> existed, I took that as a hint to fold the two functions together. Is
> that not what you meant?
I'm obviously a doofus for suggesting a name that already existed. :)
You probably just want a name that lets the caller know that you're
growing both the pgdats and the zones somehow.
-- Dave
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining
2008-04-04 1:32 ` Jeremy Fitzhardinge
@ 2008-04-04 1:41 ` Dave Hansen
0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2008-04-04 1:41 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
On Thu, 2008-04-03 at 18:32 -0700, Jeremy Fitzhardinge wrote:
> Dave Hansen wrote:
> > On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
> >> +int prepare_online_pages(unsigned long pfn, unsigned long nr_pages)
> >> +{
> >> + int ret = notify_going_online(pfn, nr_pages);
> >> + if (ret)
> >> + return ret;
> >> +
> >> + grow_zone_span(pfn, pfn+nr_pages);
> >> return 0;
> >> +}
> >>
> >
> > OK, after seeing this used in the Xen driver, I'm even less a fan of the
> > name. Mostly because it doesn't actually prepare *pages* to be onlined.
> > It prepares the zones/pgdats and notifies that pages might soon be
> > online. Can you think of any better names?
> >
> > grow_and_notify...??
>
> Does it even need to be a separately visible function? Could it just be
> part of add_memory()? Is there any reason delay doing it until
> online_pages()?
Yeah, the add_memory() itself is supposed to be easily able to be rolled
back. Say the machine didn't actually need the memory, it could just
give the memory back. No danger of fragmentation or anything.
But, at the same time, we don't want to have this memory sitting there
being unused, but still being accounted for in the VM as real memory.
It's not in use, and making it look in use (by growing the zone spans)
could upset the VM watermarks.
> > It's also a bit funky because you're calling the online notifiers, but
> > you're not actually onlining the pages, yet. Does that have any
> > repercussions?
>
> No. It will always call the GOING_ONLINE notifier, but it will only
> call the ONLINE notifier if it actually bulk-onlines all the pages. In
> my page-by-page case, it will never end up calling the ONLINE notifier.
> I could call it repeatedly for each page, but I'm not sure how useful
> that is (the lack of any users of the ONLINE notifier makes it hard to
> judge).
Ahhh. You're completely right. I was confused by ONLINE and
GOING_ONLINE. Thanks for the reminder!
-- Dave
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5 of 6] hotplug-memory: add section_ops
2008-04-04 0:51 ` Dave Hansen
2008-04-04 1:12 ` Jeremy Fitzhardinge
@ 2008-04-04 1:47 ` KAMEZAWA Hiroyuki
2008-04-04 5:35 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 30+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-04 1:47 UTC (permalink / raw)
To: Dave Hansen
Cc: Jeremy Fitzhardinge, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
On Thu, 03 Apr 2008 17:51:12 -0700
Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
> > Add a per-section "section_ops" structure, allowing each section to have
> > specific functions for onlining and offlining pages within the section.
> > This is used by the Xen balloon driver to make sure that pages are not
> > onlined without some physical memory backing them.
>
> This is kinda a lot of code and mucking around for what we actually get
> out of it, especially since you just condensed down all the actual
> online_page() instances.
>
> I think it might just be nicer to have a global list of these handlers
> somewhere. The Xen driver can just say "put me on the list of
> callbacks" and we'll call them at online_page(). I really don't think
> we need to be passing an ops structure around.
>
> KAME, did you have some other ideas about this?
>
At first. please don't call online_page() handler in add_memory() phase.
online_page() handler should be called in online_pages().
Passing handler to online_pages() is much easier and it's ok to me.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5 of 6] hotplug-memory: add section_ops
2008-04-04 1:12 ` Jeremy Fitzhardinge
@ 2008-04-04 1:52 ` Dave Hansen
2008-04-04 5:32 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2008-04-04 1:52 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
On Thu, 2008-04-03 at 18:12 -0700, Jeremy Fitzhardinge wrote:
> Dave Hansen wrote:
> > On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
> > I think it might just be nicer to have a global list of these handlers
> > somewhere. The Xen driver can just say "put me on the list of
> > callbacks" and we'll call them at online_page(). I really don't think
> > we need to be passing an ops structure around.
> >
>
> Yes, but it seems a bit awkward. If we assume that:
>
> 1. Xen will be the only user of the hook, and
> 2. Xen-balloon hotplug is exclusive of real memory hotplug
>
> then I guess its reasonable (though if that's the case it would be
> simpler to just put a direct call under #ifdef CONFIG_XEN_BALLOON in there).
Yeah, I'm OK with something along those lines, too. I'd prefer sticking
some stubs in a header and putting the #ifdef there, if only for
aesthetic reasons.
> But if we think there can be multiple callbacks, and they all get called
> on the online of each page, and there can be multiple kinds of hotplug
> memory it gets pretty messy. Each has to determine "why was I called on
> this page?" and you'd to work out which one actually does the job of
> onlining. It just seems cleaner to say "this section needs to be
> onlined like this", and there's no ambiguity.
I really wish we'd stop calling it "page online". :)
Let me think out loud for a sec here. Here's how memory hotplug works
in a nutshell:
First step (add_memory() or probe time):
1. get more memory made available
2. create kva mapping for that memory (for lowmem)
3. allocate 'struct pages'
Second step, 'echo 1 > .../memoryXXX/online' time:
4. modify zone/pgdat spans (make the VM account for the memory)
5. Initialize the 'struct page'
6. free the memory into the buddy allocator
You can't do (2) because Xen doesn't allow mappings to be created until
real backing is there. You've already done this, right?
You don't want to do (6) either, because there is no mapping for the
page and it isn't committed in hardware, yet, so you don't want someone
grabbing it *out* of the buddy allocator and using it.
Your solution is to take those first 1-5 steps, and had the balloon
driver call them directly. The online_page() modifications are because
5/6 are a bit intertwined right now. Are we on the same page so far?
Why don't we just free the page back into the balloon driver? Take the
existing steps 1-5, use them as they stand today, and just chop of step
6 for Xen. It'd save a bunch of this code churn and also stop us from
proliferating any kind of per-config section-online behavior like you're
asking about above.
That might also be generalizable to anyone else that wants the "fresh
meat" newly-hotplugged memory. Large page users might be other
consumers here.
> I'm already anticipating using the ops mechanism to support another
> class of Xen hotplug memory for managing large pages.
Do tell. :)
-- Dave
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining
2008-04-04 0:05 ` [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining Jeremy Fitzhardinge
2008-04-04 1:06 ` Dave Hansen
2008-04-04 1:09 ` Dave Hansen
@ 2008-04-04 1:56 ` Yasunori Goto
2008-04-04 5:34 ` Jeremy Fitzhardinge
2 siblings, 1 reply; 30+ messages in thread
From: Yasunori Goto @ 2008-04-04 1:56 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: KAMEZAWA Hiroyuki, Dave Hansen, Ingo Molnar, LKML,
Christoph Lameter
> +/* Mark a set of pages as online */
> +unsigned long mark_pages_onlined(unsigned long pfn, unsigned long nr_pages)
> +{
> + struct zone *zone = page_zone(pfn_to_page(pfn));
> + unsigned long onlined_pages = 0;
> + int need_zonelists_rebuild = 0;
>
> /*
> * If this zone is not populated, then it is not in zonelist.
> @@ -240,10 +246,38 @@
> vm_total_pages = nr_free_pagecache_pages();
> writeback_set_ratelimit();
>
> - if (onlined_pages)
> + if (onlined_pages) {
> + struct memory_notify arg;
> +
> + arg.start_pfn = pfn; /* ? */
> + arg.nr_pages = onlined_pages;
> + arg.status_change_nid = -1; /* ? */
status_change_nid is to prepare data structures which are allocated on
each NUMA node.
When memory less node gets new memory, and it changes status
to normal memory, then status_change_nid must be set.
SLUB uses it now.
Thanks.
--
Yasunori Goto
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5 of 6] hotplug-memory: add section_ops
2008-04-04 1:52 ` Dave Hansen
@ 2008-04-04 5:32 ` Jeremy Fitzhardinge
2008-04-04 14:22 ` Dave Hansen
0 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 5:32 UTC (permalink / raw)
To: Dave Hansen
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
Dave Hansen wrote:
> On Thu, 2008-04-03 at 18:12 -0700, Jeremy Fitzhardinge wrote:
>
>> Dave Hansen wrote:
>>
>>> On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
>>> I think it might just be nicer to have a global list of these handlers
>>> somewhere. The Xen driver can just say "put me on the list of
>>> callbacks" and we'll call them at online_page(). I really don't think
>>> we need to be passing an ops structure around.
>>>
>>>
>> Yes, but it seems a bit awkward. If we assume that:
>>
>> 1. Xen will be the only user of the hook, and
>> 2. Xen-balloon hotplug is exclusive of real memory hotplug
>>
>> then I guess its reasonable (though if that's the case it would be
>> simpler to just put a direct call under #ifdef CONFIG_XEN_BALLOON in there).
>>
>
> Yeah, I'm OK with something along those lines, too. I'd prefer sticking
> some stubs in a header and putting the #ifdef there, if only for
> aesthetic reasons.
>
Sure. But I think its a very non-scalable approach; as soon as there's
a second user who wants to do something like this, its worth going to a
more ops or function-pointer path.
>> But if we think there can be multiple callbacks, and they all get called
>> on the online of each page, and there can be multiple kinds of hotplug
>> memory it gets pretty messy. Each has to determine "why was I called on
>> this page?" and you'd to work out which one actually does the job of
>> onlining. It just seems cleaner to say "this section needs to be
>> onlined like this", and there's no ambiguity.
>>
>
> I really wish we'd stop calling it "page online". :)
>
> Let me think out loud for a sec here. Here's how memory hotplug works
> in a nutshell:
>
> First step (add_memory() or probe time):
> 1. get more memory made available
> 2. create kva mapping for that memory (for lowmem)
> 3. allocate 'struct pages'
>
> Second step, 'echo 1 > .../memoryXXX/online' time:
> 4. modify zone/pgdat spans (make the VM account for the memory)
> 5. Initialize the 'struct page'
> 6. free the memory into the buddy allocator
>
> You can't do (2) because Xen doesn't allow mappings to be created until
> real backing is there. You've already done this, right?
>
Well, it hasn't been an issue so far, because I've only been working
with x86-32 where all hotplug memory is highmem. But, yes, on x86-64
and other architectures it would have to defer creating the mappings
until it gets the pages (page by page).
> You don't want to do (6) either, because there is no mapping for the
> page and it isn't committed in hardware, yet, so you don't want someone
> grabbing it *out* of the buddy allocator and using it.
>
Right. I do that page by page in the balloon driver; each time I get a
machine page, I bind it to the corresponding page structure and free it
into the allocator.
And I skip the whole "echo online > /sys..." part, because its
redundant: the use of hotplug memory is an internal implementation
detail of the balloon driver, which users needn't know about when they
deal with the balloon driver.
> Your solution is to take those first 1-5 steps, and had the balloon
> driver call them directly. The online_page() modifications are because
> 5/6 are a bit intertwined right now. Are we on the same page so far?
>
Right.
> Why don't we just free the page back into the balloon driver? Take the
> existing steps 1-5, use them as they stand today, and just chop of step
> 6 for Xen. It'd save a bunch of this code churn and also stop us from
> proliferating any kind of per-config section-online behavior like you're
> asking about above.
>
That's more or less what I've done. I've grouped it as 1-4 when the
balloon driver decides it needs more page structures, then 5&6 page by
page when it actually gets some backing memory.
> That might also be generalizable to anyone else that wants the "fresh
> meat" newly-hotplugged memory. Large page users might be other
> consumers here.
>
Sure. The main problem is that 1-3 also ends up implicitly registering
the new section with sysfs, so the bulk online interface becomes to
usermode. If we make that optional (ie, a separate explicit call the
memory-adder can choose to take) then everything is rosy.
>> I'm already anticipating using the ops mechanism to support another
>> class of Xen hotplug memory for managing large pages.
>>
>
> Do tell. :)
>
I think I mentioned it before. If we 1) modify Xen to manage domain
memory in large pages, 2) have a reasonably small section size, then we
can reasonably do all memory management directly via the hotplug
interface. Bringing each (large) page online would still require some
explicit action, but it would be a much closer fit to how the hotplug
machinery currently works. Then a small user or kernel mode policy
daemon could use it to replicate the existing balloon driver's
functionality.
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining
2008-04-04 1:56 ` Yasunori Goto
@ 2008-04-04 5:34 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 5:34 UTC (permalink / raw)
To: Yasunori Goto
Cc: KAMEZAWA Hiroyuki, Dave Hansen, Ingo Molnar, LKML,
Christoph Lameter
Yasunori Goto wrote:
>> +/* Mark a set of pages as online */
>> +unsigned long mark_pages_onlined(unsigned long pfn, unsigned long nr_pages)
>> +{
>> + struct zone *zone = page_zone(pfn_to_page(pfn));
>> + unsigned long onlined_pages = 0;
>> + int need_zonelists_rebuild = 0;
>>
>> /*
>> * If this zone is not populated, then it is not in zonelist.
>> @@ -240,10 +246,38 @@
>> vm_total_pages = nr_free_pagecache_pages();
>> writeback_set_ratelimit();
>>
>> - if (onlined_pages)
>> + if (onlined_pages) {
>> + struct memory_notify arg;
>> +
>> + arg.start_pfn = pfn; /* ? */
>> + arg.nr_pages = onlined_pages;
>> + arg.status_change_nid = -1; /* ? */
>>
>
> status_change_nid is to prepare data structures which are allocated on
> each NUMA node.
>
OK. I didn't really understand the intent this logic in the
GOING_ONLINE notifier path:
arg.status_change_nid = -1;
nid = page_to_nid(pfn_to_page(pfn));
if (node_present_pages(nid) == 0)
arg.status_change_nid = nid;
and how that should relate to the ONLINE notifier.
> When memory less node gets new memory, and it changes status
> to normal memory, then status_change_nid must be set.
>
> SLUB uses it now.
>
Only for GOING_ONLINE.
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5 of 6] hotplug-memory: add section_ops
2008-04-04 1:47 ` KAMEZAWA Hiroyuki
@ 2008-04-04 5:35 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 5:35 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Dave Hansen, Yasunori Goto, Ingo Molnar, LKML, Christoph Lameter
KAMEZAWA Hiroyuki wrote:
> On Thu, 03 Apr 2008 17:51:12 -0700
> Dave Hansen <dave@linux.vnet.ibm.com> wrote:
>
>
>> On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
>>
>>> Add a per-section "section_ops" structure, allowing each section to have
>>> specific functions for onlining and offlining pages within the section.
>>> This is used by the Xen balloon driver to make sure that pages are not
>>> onlined without some physical memory backing them.
>>>
>> This is kinda a lot of code and mucking around for what we actually get
>> out of it, especially since you just condensed down all the actual
>> online_page() instances.
>>
>> I think it might just be nicer to have a global list of these handlers
>> somewhere. The Xen driver can just say "put me on the list of
>> callbacks" and we'll call them at online_page(). I really don't think
>> we need to be passing an ops structure around.
>>
>> KAME, did you have some other ideas about this?
>>
>>
>
> At first. please don't call online_page() handler in add_memory() phase.
> online_page() handler should be called in online_pages().
>
Yes, that's how it is at the moment.
> Passing handler to online_pages() is much easier and it's ok to me.
>
Rather than an ops structure associated with the section? That's a
possibility... I'll see how that looks...
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5 of 6] hotplug-memory: add section_ops
2008-04-04 5:32 ` Jeremy Fitzhardinge
@ 2008-04-04 14:22 ` Dave Hansen
2008-04-04 18:21 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2008-04-04 14:22 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
On Thu, 2008-04-03 at 22:32 -0700, Jeremy Fitzhardinge wrote:
> Dave Hansen wrote:
> > First step (add_memory() or probe time):
> > 1. get more memory made available
> > 2. create kva mapping for that memory (for lowmem)
> > 3. allocate 'struct pages'
> >
> > Second step, 'echo 1 > .../memoryXXX/online' time:
> > 4. modify zone/pgdat spans (make the VM account for the memory)
> > 5. Initialize the 'struct page'
> > 6. free the memory into the buddy allocator
> >
> > You can't do (2) because Xen doesn't allow mappings to be created until
> > real backing is there. You've already done this, right?
...
> > You don't want to do (6) either, because there is no mapping for the
> > page and it isn't committed in hardware, yet, so you don't want someone
> > grabbing it *out* of the buddy allocator and using it.
> >
>
> Right. I do that page by page in the balloon driver; each time I get a
> machine page, I bind it to the corresponding page structure and free it
> into the allocator.
>
> And I skip the whole "echo online > /sys..." part, because its
> redundant: the use of hotplug memory is an internal implementation
> detail of the balloon driver, which users needn't know about when they
> deal with the balloon driver.
I disagree that it is redundant. There is actually a lot of power
there, and it was done for some very specific reasons. They're actually
especially important to your particular use, which I'll try to explain.
Say that you have a very fragmented lowmem-only machine and you want to
add a section of new mem_map[]. That's 512MB/PAGE_SIZE*sizeof(struct
page), which is between 4 and 8MB. So, you need 8MB of contiguous pages
for that mem_map[]. The way we designed it, you could always have that
extra section sitting there, unused. You have that overhead of 8MB
sitting there, but the benefit is that you can *ALWAYS* add memory.
I worry that the current patches as posted don't allow this kind of
behavior. If there was ever a need to preallocate the space, it would
need to do it inside the balloon driver itself and a way found to pass
that into the hotplug code.
It is especially important for Xen because all of the other
architectures, when getting a new section, magically have a large new
contiguous area out of which to get a new mem_map[] or other structures.
Xen doesn't actually have any contiguous memory at all since it will be
adding a page at a time. Could you add some new functionality to the
Xen driver and make sure to allocate the new section's mem_map[] inside
the section? That would get around the problem.
> > That might also be generalizable to anyone else that wants the "fresh
> > meat" newly-hotplugged memory. Large page users might be other
> > consumers here.
>
> Sure. The main problem is that 1-3 also ends up implicitly registering
> the new section with sysfs, so the bulk online interface becomes to
> usermode. If we make that optional (ie, a separate explicit call the
> memory-adder can choose to take) then everything is rosy.
OK, but the registering isn't a bad thing, either. You just don't want
all of the pages to be stuck back into the allocator. You want to suck
them into the balloon driver. Right?
Let's put a hook in online_page() to divert the new pages into your
driver instead of into the allocator. Problem solved, right?
We might do this with a free_hotplugged_page() function. It can, in the
initial introduction patch, just call free_page(). You can modify it
later to do your bidding.
> >> I'm already anticipating using the ops mechanism to support another
> >> class of Xen hotplug memory for managing large pages.
> >>
> >
> > Do tell. :)
> >
>
> I think I mentioned it before. If we 1) modify Xen to manage domain
> memory in large pages, 2) have a reasonably small section size, then we
> can reasonably do all memory management directly via the hotplug
> interface. Bringing each (large) page online would still require some
> explicit action, but it would be a much closer fit to how the hotplug
> machinery currently works. Then a small user or kernel mode policy
> daemon could use it to replicate the existing balloon driver's
> functionality.
This sounds like a tall order considering Xen's current tight coupling
between memory mapping and allocation, but I trust you and your
colleagues to get it done. :)
-- Dave
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5 of 6] hotplug-memory: add section_ops
2008-04-04 14:22 ` Dave Hansen
@ 2008-04-04 18:21 ` Jeremy Fitzhardinge
2008-04-04 19:28 ` Christoph Lameter
0 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 18:21 UTC (permalink / raw)
To: Dave Hansen
Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML,
Christoph Lameter
Dave Hansen wrote:
> I disagree that it is redundant. There is actually a lot of power
> there, and it was done for some very specific reasons. They're actually
> especially important to your particular use, which I'll try to explain.
>
> Say that you have a very fragmented lowmem-only machine and you want to
> add a section of new mem_map[]. That's 512MB/PAGE_SIZE*sizeof(struct
> page), which is between 4 and 8MB. So, you need 8MB of contiguous pages
> for that mem_map[]. The way we designed it, you could always have that
> extra section sitting there, unused. You have that overhead of 8MB
> sitting there, but the benefit is that you can *ALWAYS* add memory.
>
> I worry that the current patches as posted don't allow this kind of
> behavior. If there was ever a need to preallocate the space, it would
> need to do it inside the balloon driver itself and a way found to pass
> that into the hotplug code.
>
Yes, there's a chicken-egg problem if you want to use the new memory to
describe itself. But in the x86-32 case, hotplug memory is always
highmem, and so can't be used for page structures, right?
> It is especially important for Xen because all of the other
> architectures, when getting a new section, magically have a large new
> contiguous area out of which to get a new mem_map[] or other structures.
> Xen doesn't actually have any contiguous memory at all since it will be
> adding a page at a time. Could you add some new functionality to the
> Xen driver and make sure to allocate the new section's mem_map[] inside
> the section? That would get around the problem.
>
At the time the balloon driver decides it needs some new page
structures, it has some new pages in hand. If it knew where to put
them, it could map those new pages as the seed of a new section. The
problem is guaranteeing enough pages to be able to populate the
section's memmap - its a bit of a catch-22 if you can't hotplug new
memory unless you can allocate at least 4-8MB.
>>> That might also be generalizable to anyone else that wants the "fresh
>>> meat" newly-hotplugged memory. Large page users might be other
>>> consumers here.
>>>
>> Sure. The main problem is that 1-3 also ends up implicitly registering
>> the new section with sysfs, so the bulk online interface becomes to
>> usermode. If we make that optional (ie, a separate explicit call the
>> memory-adder can choose to take) then everything is rosy.
>>
>
> OK, but the registering isn't a bad thing, either. You just don't want
> all of the pages to be stuck back into the allocator. You want to suck
> them into the balloon driver. Right?
>
Yeah. And I have no objection to things appearing in /sys, or even
making the "state" files do something sensible if you write to them. I
just don't want to make it too easy to shoot yourself in the foot.
> Let's put a hook in online_page() to divert the new pages into your
> driver instead of into the allocator. Problem solved, right?
>
> We might do this with a free_hotplugged_page() function. It can, in the
> initial introduction patch, just call free_page(). You can modify it
> later to do your bidding.
>
Sure. That's more or less what the ops patch lets me do. The question
is whether the right answer is:
1. add an ops structure to struct mem_section
2. pass an "online_page" function pointer into online_pages, or
3. have a special-case Xen hook in online_page
1 I've already implemented, Kame suggested 2, and we also mentioned 3 in
passing.
I think 2 is looking like a nice tradeoff between complexity and
flexibility.
> This sounds like a tall order considering Xen's current tight coupling
> between memory mapping and allocation, but I trust you and your
> colleagues to get it done. :)
>
The combination of needing to map pagetables RO and grant tables for IO
makes it unlikely that we'll be able to map the kernel space with large
pages in the short term. But it does allow usermode to use large pages,
and maybe it would be worth trying to work out how to reconfigure the
kernel address space to separate kernel code/data from pagetable and IO
mappings...
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5 of 6] hotplug-memory: add section_ops
2008-04-04 18:21 ` Jeremy Fitzhardinge
@ 2008-04-04 19:28 ` Christoph Lameter
2008-04-04 20:38 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 30+ messages in thread
From: Christoph Lameter @ 2008-04-04 19:28 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Dave Hansen, KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML
On Fri, 4 Apr 2008, Jeremy Fitzhardinge wrote:
> > Say that you have a very fragmented lowmem-only machine and you want to
> > add a section of new mem_map[]. That's 512MB/PAGE_SIZE*sizeof(struct
> > page), which is between 4 and 8MB. So, you need 8MB of contiguous pages
> > for that mem_map[]. The way we designed it, you could always have that
Note that you could use 4k page size chunks for the memmap if its
virtualized (CONFIG_SPARSEMEM_VMEMMAP). Instead of a single PMD
pointing to a 2MB block you would have the PMD point to a block of
ptes that would in turn point to the discontiguous series of 4k pages.
The memory is virtually contiguous so all the logic stays the same. What
would be needed is some enhancements to the way the memmap is populated.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5 of 6] hotplug-memory: add section_ops
2008-04-04 19:28 ` Christoph Lameter
@ 2008-04-04 20:38 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-04 20:38 UTC (permalink / raw)
To: Christoph Lameter
Cc: Dave Hansen, KAMEZAWA Hiroyuki, Yasunori Goto, Ingo Molnar, LKML
Christoph Lameter wrote:
> Note that you could use 4k page size chunks for the memmap if its
> virtualized (CONFIG_SPARSEMEM_VMEMMAP). Instead of a single PMD
> pointing to a 2MB block you would have the PMD point to a block of
> ptes that would in turn point to the discontiguous series of 4k pages.
>
> The memory is virtually contiguous so all the logic stays the same. What
> would be needed is some enhancements to the way the memmap is populated.
In Xen I think that would be accommodated by the existing
pseudo-physical to machine mapping. We don't support PSE anyway, so all
the pages would be individual 4k pages; the fact that they're
machine-discontigious is hidden because of the pseudo-phys to machine
mapping when ptes are constructed.
J
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2008-04-04 20:39 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-04 0:05 [PATCH 0 of 6] [RFC] another attempt at making hotplug memory and xen play together Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 1 of 6] hotplug-memory: refactor online_pages to separate zone growth from page onlining Jeremy Fitzhardinge
2008-04-04 1:06 ` Dave Hansen
2008-04-04 1:20 ` Jeremy Fitzhardinge
2008-04-04 1:33 ` Dave Hansen
2008-04-04 1:09 ` Dave Hansen
2008-04-04 1:32 ` Jeremy Fitzhardinge
2008-04-04 1:41 ` Dave Hansen
2008-04-04 1:56 ` Yasunori Goto
2008-04-04 5:34 ` Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 2 of 6] xen: make phys_to_machine structure dynamic Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 3 of 6] xen-balloon: use memory hot-add to expand the domain's memory Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 4 of 6] hotplug-memory: use common online_page Jeremy Fitzhardinge
2008-04-04 0:47 ` Dave Hansen
2008-04-04 0:56 ` Jeremy Fitzhardinge
2008-04-04 1:00 ` Dave Hansen
2008-04-04 1:11 ` Jeremy Fitzhardinge
2008-04-04 1:22 ` Dave Hansen
2008-04-04 0:05 ` [PATCH 5 of 6] hotplug-memory: add section_ops Jeremy Fitzhardinge
2008-04-04 0:51 ` Dave Hansen
2008-04-04 1:12 ` Jeremy Fitzhardinge
2008-04-04 1:52 ` Dave Hansen
2008-04-04 5:32 ` Jeremy Fitzhardinge
2008-04-04 14:22 ` Dave Hansen
2008-04-04 18:21 ` Jeremy Fitzhardinge
2008-04-04 19:28 ` Christoph Lameter
2008-04-04 20:38 ` Jeremy Fitzhardinge
2008-04-04 1:47 ` KAMEZAWA Hiroyuki
2008-04-04 5:35 ` Jeremy Fitzhardinge
2008-04-04 0:05 ` [PATCH 6 of 6] xen-balloon: define a section_ops Jeremy Fitzhardinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).