public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Trying to make use of hotplug memory for xen balloon driver
@ 2008-03-26 23:11 Jeremy Fitzhardinge
  2008-03-27  0:09 ` Dave Hansen
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-26 23:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Yasunori Goto, Christoph Lameter
  Cc: Linux Kernel Mailing List, Anthony Liguori, Chris Wright

Hi,

I'm trying to make use of hotplug memory in the Xen balloon driver.  If 
you want to expand a domain to be larger than its initial size, it must 
add new page structures to describe the new memory.

The platform is x86-32, with CONFIG_SPARSEMEM and 
CONFIG_HOTPLUG_MEMORY.  Because the new memory is only pseudo-physical, 
the physical address within the domain is arbitrary, and I added a 
add_memory_resource() function so I could use allocate_resource() to 
find an appropriate address to put the new memory at.

When I want to expand the domain's memory, I do (error checking edited 
out for brevity):

        res = kzalloc(sizeof(*res), GFP_KERNEL);

        res->name = "Xen Balloon";
        res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;

        ret = allocate_resource(&iomem_resource, res, size, 0, -1,
                                PAGE_SIZE, NULL, NULL);

        ret = add_memory_resource(0, res);

        start_pfn = res->start >> PAGE_SHIFT;
        end_pfn = (res->end + 1) >> PAGE_SHIFT;

        ret = xen_resize_phys_to_mach(end_pfn);

        for(pfn = start_pfn; pfn < end_pfn; pfn++) {
                struct page *page = pfn_to_page(pfn);

                if (PageReserved(page))
                        continue;

                set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
                balloon_append(page);
        }

at this point the pages have no underlying machine (physical) memory, 
but are added to the list of potentially usable pages.  This all works fine.

However, when I actually want to use one of these pages, I do:

                page = balloon_retrieve();

                pfn = page_to_pfn(page);

                set_phys_to_machine(pfn, frame_list[i]);

                /* Relinquish the page back to the allocator. */
                online_page(page);

                /* Link back into the page tables if not highmem. */
                if (pfn < max_low_pfn) {	/* !PageHighMem(page) ? */
                        int ret;
                        ret = HYPERVISOR_update_va_mapping(
                                (unsigned long)__va(pfn << PAGE_SHIFT),
                                mfn_pte(frame_list[i], PAGE_KERNEL),
                                0);
                        BUG_ON(ret);
                }

This has two problems:

   1. the online_page() raises an error:

      Bad page state in process 'events/0'
      page:c16fa0cc flags:0x00000000 mapping:00000000 mapcount:1 count:0
      Trying to fix it up, but a reboot is needed
      Backtrace:
      Pid: 9, comm: events/0 Not tainted 2.6.25-rc7-x86-latest.git-dirty #353
       [<c015643a>] bad_page+0x55/0x82
       [<c0156be6>] free_hot_cold_page+0x60/0x1f1
       [<c0103069>] ? xen_restore_fl+0x2e/0x52
       [<c0156dae>] free_hot_page+0xa/0xc
       [<c0156dcb>] __free_pages+0x1b/0x26
       [<c0466e8c>] free_new_highpage+0x11/0x19
       [<c0466ea1>] online_page+0xd/0x1b
       [<c02809ac>] balloon_process+0x1e6/0x4d3
       [<c014671a>] ? lock_acquire+0x90/0x9d
       [<c0137720>] run_workqueue+0xbb/0x186
       [<c01376e5>] ? run_workqueue+0x80/0x186
       [<c02807c6>] ? balloon_process+0x0/0x4d3
       [<c0137fe6>] ? worker_thread+0x0/0xbe
       [<c0138099>] worker_thread+0xb3/0xbe
       [<c013a635>] ? autoremove_wake_function+0x0/0x33
       [<c013a56a>] kthread+0x3b/0x61
       [<c013a52f>] ? kthread+0x0/0x61
       [<c0108b67>] kernel_thread_helper+0x7/0x10
       =======================
          

      I can solve this by putting an explicit reset_page_mapcount(page)
      before online_page(), but I can't see any other hotplug memory
      code which does this.

   2. The new pages don't appear to be in the right zone.  When I boot a
      256M domain I get an initial setup of:

      Zone PFN ranges:
        DMA             0 ->     4096
        Normal       4096 ->    65536
        HighMem     65536 ->    65536
      Movable zone start PFN for each node
      early_node_map[1] active PFN ranges
          0:        0 ->    65536
      On node 0 totalpages: 65536
        DMA zone: 52 pages used for memmap
        DMA zone: 0 pages reserved
        DMA zone: 4044 pages, LIFO batch:0
        Normal zone: 780 pages used for memmap
        Normal zone: 60660 pages, LIFO batch:15
        HighMem zone: 0 pages used for memmap
        Movable zone: 0 pages used for memmap
          

      which presumably means that new pages above pfn 65536 should be in
      the highmem zone?  But PageHighMem() returns false for those pages.

What am I missing here?

Thanks,
    J

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-26 23:11 Trying to make use of hotplug memory for xen balloon driver Jeremy Fitzhardinge
@ 2008-03-27  0:09 ` Dave Hansen
  2008-03-27  0:15   ` Jeremy Fitzhardinge
  2008-03-27  1:23   ` Christoph Lameter
  2008-03-27  0:26 ` Dave Hansen
  2008-03-27  0:50 ` KAMEZAWA Hiroyuki
  2 siblings, 2 replies; 13+ messages in thread
From: Dave Hansen @ 2008-03-27  0:09 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Christoph Lameter,
	Linux Kernel Mailing List, Anthony Liguori, Chris Wright


On Wed, 2008-03-26 at 16:11 -0700, Jeremy Fitzhardinge wrote:
> 
> 
> I'm trying to make use of hotplug memory in the Xen balloon driver.
> If 
> you want to expand a domain to be larger than its initial size, it
> must 
> add new page structures to describe the new memory.
> 
> The platform is x86-32, with CONFIG_SPARSEMEM and 
> CONFIG_HOTPLUG_MEMORY.  Because the new memory is only
> pseudo-physical, 
> the physical address within the domain is arbitrary, and I added a 
> add_memory_resource() function so I could use allocate_resource() to 
> find an appropriate address to put the new memory at.
> 
> When I want to expand the domain's memory, I do (error checking
> edited 
> out for brevity):
> 
>         res = kzalloc(sizeof(*res), GFP_KERNEL);
> 
>         res->name = "Xen Balloon";
>         res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> 
>         ret = allocate_resource(&iomem_resource, res, size, 0, -1,
>                                 PAGE_SIZE, NULL, NULL);
> 
>         ret = add_memory_resource(0, res);

Yeah, this is your problem.  You've only allocated the iomem *resource*
for the memory area, which means that you've basically claimed the
physical addresses.

But, you don't have any 'struct page's there.

We really screwed up the memory hotplug code and ended up with some
incredibly arcane function names.  You might want to look at
add_memory().  It is hidden away in mm/memory_hotplug.c :)

You might also note that most of the ppc64 memory hotplug is driven by
userspace.  The hypervisor actually contacts a daemon on the guest to
tell it where its new memory is.  That daemon does the addition
through /sys/devices/system/memory/probe.  

-- Dave


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-27  0:09 ` Dave Hansen
@ 2008-03-27  0:15   ` Jeremy Fitzhardinge
  2008-03-27  1:23   ` Christoph Lameter
  1 sibling, 0 replies; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-27  0:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Christoph Lameter,
	Linux Kernel Mailing List, Anthony Liguori, Chris Wright

Dave Hansen wrote:
> Yeah, this is your problem.  You've only allocated the iomem *resource*
> for the memory area, which means that you've basically claimed the
> physical addresses.
>
> But, you don't have any 'struct page's there.
>
> We really screwed up the memory hotplug code and ended up with some
> incredibly arcane function names.  You might want to look at
> add_memory().  It is hidden away in mm/memory_hotplug.c :)
>   

Sorry, I should have been clearer.  add_memory_resource() is a function 
I added; it's effectively add_memory() with the resource-allocating part 
factored out:

--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -171,7 +171,10 @@
 
 #endif /* ! CONFIG_MEMORY_HOTPLUG */
 
+struct resource;
+
 extern int add_memory(int nid, u64 start, u64 size);
+extern int add_memory_resource(int nid, struct resource *res);
 extern int arch_add_memory(int nid, u64 start, u64 size);
 extern int remove_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
===================================================================
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -278,14 +278,28 @@
 
 int add_memory(int nid, u64 start, u64 size)
 {
-	pg_data_t *pgdat = NULL;
-	int new_pgdat = 0;
 	struct resource *res;
 	int ret;
 
 	res = register_memory_resource(start, size);
 	if (!res)
 		return -EEXIST;
+
+	ret = add_memory_resource(nid, res);
+
+	if (ret)
+		release_memory_resource(res);
+
+	return ret;
+}
+
+int add_memory_resource(int nid, struct resource *res)
+{
+	pg_data_t *pgdat = NULL;
+	int new_pgdat = 0;
+	int ret;
+	u64 start = res->start;
+	u64 size = res->end - res->start + 1;
 
 	if (!node_online(nid)) {
 		pgdat = hotadd_new_pgdat(nid, start);
@@ -320,8 +334,6 @@
 	/* rollback pgdat allocation and others */
 	if (new_pgdat)
 		rollback_node_hotadd(nid, pgdat);
-	if (res)
-		release_memory_resource(res);
 
 	return ret;
 }


> You might also note that most of the ppc64 memory hotplug is driven by
> userspace.  The hypervisor actually contacts a daemon on the guest to
> tell it where its new memory is.  That daemon does the addition
> through /sys/devices/system/memory/probe.  
>   

X86 Xen does it with a combination of hypervisor and userspace.  Mostly 
it comes down to asking the hypervisor to provide a machine page to put 
under a guest pseudo-physical page.

    J

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-26 23:11 Trying to make use of hotplug memory for xen balloon driver Jeremy Fitzhardinge
  2008-03-27  0:09 ` Dave Hansen
@ 2008-03-27  0:26 ` Dave Hansen
  2008-03-27 22:23   ` Jeremy Fitzhardinge
  2008-03-27  0:50 ` KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2008-03-27  0:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Christoph Lameter,
	Linux Kernel Mailing List, Anthony Liguori, Chris Wright


On Wed, 2008-03-26 at 16:11 -0700, Jeremy Fitzhardinge wrote:
>       Bad page state in process 'events/0'
>       page:c16fa0cc flags:0x00000000 mapping:00000000 mapcount:1 count:0
>       Trying to fix it up, but a reboot is needed

The flags being all null looks highly suspicious to me.

Once you've done an add_memory(), the new sections should show up
in /sys.  Do you see them in there?

Once they show up, you can online them with:

	echo online > /sys/devices/system/memory/memoryXXX/state

That's what actually goes and mucks with the 'struct zone's and the
pgdats to expand them.  It will also call online_page() on the whole
range.  I think you're trying to do this manually, and missing part of
it.  

There's some documentation here:

http://kerneltrap.org/node/14009

But, think of it this way: "add" is what the hardware does.  "online" is
what Linux does after the memory has been added so that it can be used.

-- Dave


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-26 23:11 Trying to make use of hotplug memory for xen balloon driver Jeremy Fitzhardinge
  2008-03-27  0:09 ` Dave Hansen
  2008-03-27  0:26 ` Dave Hansen
@ 2008-03-27  0:50 ` KAMEZAWA Hiroyuki
  2008-03-27  5:57   ` Jeremy Fitzhardinge
  2 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-03-27  0:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Yasunori Goto, Christoph Lameter, Linux Kernel Mailing List,
	Anthony Liguori, Chris Wright

On Wed, 26 Mar 2008 16:11:54 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Hi,
> 
> I'm trying to make use of hotplug memory in the Xen balloon driver.  If 
> you want to expand a domain to be larger than its initial size, it must 
> add new page structures to describe the new memory.
> 
> The platform is x86-32, with CONFIG_SPARSEMEM and 
> CONFIG_HOTPLUG_MEMORY.  Because the new memory is only pseudo-physical, 
> the physical address within the domain is arbitrary, and I added a 
> add_memory_resource() function so I could use allocate_resource() to 
> find an appropriate address to put the new memory at.
> 
welcome to chaos of memory hotplug :)

>    1. the online_page() raises an error:
> 
>       Bad page state in process 'events/0'
>       page:c16fa0cc flags:0x00000000 mapping:00000000 mapcount:1 count:0
>       Trying to fix it up, but a reboot is needed

Hmm, this seems memmap is not initialized correctly...
page->flags == 0 means page is in ZONE_DMA.(it's only 16MB range on x86)
I think memmap is not initilalized.

Calling path to memmap initailization is.
==
  add_memory()
	-> arch_add_memory()
		->  __add_page()
			-> __add_section()
				-> __add_zone()
					-> memmap_init_zone() 
==
Please check what arch_add_memory() is called, at first.



>    2. The new pages don't appear to be in the right zone.  When I boot a
>       256M domain I get an initial setup of:
> 
>       Zone PFN ranges:
>         DMA             0 ->     4096
>         Normal       4096 ->    65536
>         HighMem     65536 ->    65536
>       Movable zone start PFN for each node
>       early_node_map[1] active PFN ranges
>           0:        0 ->    65536
>       On node 0 totalpages: 65536
>         DMA zone: 52 pages used for memmap
>         DMA zone: 0 pages reserved
>         DMA zone: 4044 pages, LIFO batch:0
>         Normal zone: 780 pages used for memmap
>         Normal zone: 60660 pages, LIFO batch:15
>         HighMem zone: 0 pages used for memmap
>         Movable zone: 0 pages used for memmap
>           
> 
>       which presumably means that new pages above pfn 65536 should be in
>       the highmem zone?  But PageHighMem() returns false for those pages.
> 
See x86-32's arch_add_memory(). It's now designed that "all new memory will go into
ZONE_HIGHMEM".
(Because added memory is tend to be removed later.)

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-27  0:09 ` Dave Hansen
  2008-03-27  0:15   ` Jeremy Fitzhardinge
@ 2008-03-27  1:23   ` Christoph Lameter
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2008-03-27  1:23 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Dave Hansen, KAMEZAWA Hiroyuki, Yasunori Goto,
	Linux Kernel Mailing List, Anthony Liguori, Chris Wright

On Wed, 26 Mar 2008, Dave Hansen wrote:

> You might also note that most of the ppc64 memory hotplug is driven by
> userspace.  The hypervisor actually contacts a daemon on the guest to
> tell it where its new memory is.  That daemon does the addition
> through /sys/devices/system/memory/probe.  

Would it be possible to have the balloon driver use the memory hotplug 
interface instead? That would generalize the memory hotplug logic and you 
will likely find that lots of issues have already been addressed in that 
code.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-27  0:50 ` KAMEZAWA Hiroyuki
@ 2008-03-27  5:57   ` Jeremy Fitzhardinge
  2008-03-27  6:11     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-27  5:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Yasunori Goto, Christoph Lameter, Linux Kernel Mailing List,
	Anthony Liguori, Chris Wright

KAMEZAWA Hiroyuki wrote:
> On Wed, 26 Mar 2008 16:11:54 -0700
> Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>   
>> Hi,
>>
>> I'm trying to make use of hotplug memory in the Xen balloon driver.  If 
>> you want to expand a domain to be larger than its initial size, it must 
>> add new page structures to describe the new memory.
>>
>> The platform is x86-32, with CONFIG_SPARSEMEM and 
>> CONFIG_HOTPLUG_MEMORY.  Because the new memory is only pseudo-physical, 
>> the physical address within the domain is arbitrary, and I added a 
>> add_memory_resource() function so I could use allocate_resource() to 
>> find an appropriate address to put the new memory at.
>>
>>     
> welcome to chaos of memory hotplug :)
>
>   
>>    1. the online_page() raises an error:
>>
>>       Bad page state in process 'events/0'
>>       page:c16fa0cc flags:0x00000000 mapping:00000000 mapcount:1 count:0
>>       Trying to fix it up, but a reboot is needed
>>     
>
> Hmm, this seems memmap is not initialized correctly...
> page->flags == 0 means page is in ZONE_DMA.(it's only 16MB range on x86)
> I think memmap is not initilalized.
>
> Calling path to memmap initailization is.
> ==
>   add_memory()
> 	-> arch_add_memory()
> 		->  __add_page()
> 			-> __add_section()
> 				-> __add_zone()
> 					-> memmap_init_zone() 
> ==
> Please check what arch_add_memory() is called, at first.
>   

Ah, I see what it is.  I wasn't trying to add enough memory.  It adds in 
units of SECTION_SIZE_BITS, which is 2^30 on 32-bit PAE.  When I 
increase the initial balloon extension to PAGES_PER_SECTION pages, I 
make some more progress:

xen_balloon: Initialising balloon driver.
trying to reserve 262144 pages (1073741824 bytes) for balloon
bootmem alloc of 147456 bytes failed!
Kernel panic - not syncing: Out of memory
Pid: 1, comm: swapper Not tainted 2.6.25-rc7-x86-latest.git-dirty #361
 [<c01299dc>] panic+0x49/0x102
 [<c0647c3c>] __alloc_bootmem+0x24/0x29
 [<c0647c6d>] __alloc_bootmem_node+0x2c/0x34
 [<c044bd97>] zone_wait_table_init+0x45/0x95
 [<c0467258>] init_currently_empty_zone+0x1d/0xaa
 [<c01738ea>] __add_pages+0x88/0xdb
 [<c011c1a5>] arch_add_memory+0x25/0x2b
 [<c01737a9>] add_memory_resource+0x2f/0x36
 [<c064e487>] balloon_init+0x1b8/0x2b9
 [<c0635495>] kernel_init+0x137/0x292
 [<c063535e>] ? kernel_init+0x0/0x292
 [<c063535e>] ? kernel_init+0x0/0x292
 [<c0108b67>] kernel_thread_helper+0x7/0x10
 =======================


What's the rationale for setting SECTION_SIZE_BITS to 30?  Seems like a 
fairly large chunk.

    J

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-27  6:11     ` KAMEZAWA Hiroyuki
@ 2008-03-27  6:09       ` Jeremy Fitzhardinge
  2008-03-27 20:54       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-27  6:09 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Yasunori Goto, Christoph Lameter, Linux Kernel Mailing List,
	Anthony Liguori, Chris Wright

KAMEZAWA Hiroyuki wrote:
> At first, I believe usual DIMM size is bigger than SECTION_SIZE_BITS. This is
> designed for hardware-based hotplug.
>
> If you want to use memory-hotplug for virtualized enviroment, it's good to make
> this to be smaller chunk. Powerpc/IBM lpar uses 16MB chunk.
>
> It's a trade-off between section mainainance cost v.s. size of plugged memory.
> please find the best.
>   

Yes, that's what I thought.  I'd been thinking of something around the 
64-256MB mark.  I'll experiment, but I've got some Xen-specific problems 
to solve first.

Thanks,
    J

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-27  5:57   ` Jeremy Fitzhardinge
@ 2008-03-27  6:11     ` KAMEZAWA Hiroyuki
  2008-03-27  6:09       ` Jeremy Fitzhardinge
  2008-03-27 20:54       ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-03-27  6:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Yasunori Goto, Christoph Lameter, Linux Kernel Mailing List,
	Anthony Liguori, Chris Wright

On Wed, 26 Mar 2008 22:57:57 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ah, I see what it is.  I wasn't trying to add enough memory.  It adds in 
> units of SECTION_SIZE_BITS, which is 2^30 on 32-bit PAE.  When I 
> increase the initial balloon extension to PAGES_PER_SECTION pages, I 
> make some more progress:
> 
> xen_balloon: Initialising balloon driver.
> trying to reserve 262144 pages (1073741824 bytes) for balloon
> bootmem alloc of 147456 bytes failed!
> Kernel panic - not syncing: Out of memory
> Pid: 1, comm: swapper Not tainted 2.6.25-rc7-x86-latest.git-dirty #361
>  [<c01299dc>] panic+0x49/0x102
>  [<c0647c3c>] __alloc_bootmem+0x24/0x29
>  [<c0647c6d>] __alloc_bootmem_node+0x2c/0x34
>  [<c044bd97>] zone_wait_table_init+0x45/0x95
>  [<c0467258>] init_currently_empty_zone+0x1d/0xaa
>  [<c01738ea>] __add_pages+0x88/0xdb
>  [<c011c1a5>] arch_add_memory+0x25/0x2b
>  [<c01737a9>] add_memory_resource+0x2f/0x36
>  [<c064e487>] balloon_init+0x1b8/0x2b9
>  [<c0635495>] kernel_init+0x137/0x292
>  [<c063535e>] ? kernel_init+0x0/0x292
>  [<c063535e>] ? kernel_init+0x0/0x292
>  [<c0108b67>] kernel_thread_helper+0x7/0x10
>  =======================
> 
> 
> What's the rationale for setting SECTION_SIZE_BITS to 30?  Seems like a 
> fairly large chunk.
> 
At first, I believe usual DIMM size is bigger than SECTION_SIZE_BITS. This is
designed for hardware-based hotplug.

If you want to use memory-hotplug for virtualized enviroment, it's good to make
this to be smaller chunk. Powerpc/IBM lpar uses 16MB chunk.

It's a trade-off between section mainainance cost v.s. size of plugged memory.
please find the best.

Thanks,
-Kame







^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-27  6:11     ` KAMEZAWA Hiroyuki
  2008-03-27  6:09       ` Jeremy Fitzhardinge
@ 2008-03-27 20:54       ` Jeremy Fitzhardinge
  2008-03-28  0:20         ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-27 20:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Yasunori Goto, Christoph Lameter, Linux Kernel Mailing List,
	Anthony Liguori, Chris Wright

KAMEZAWA Hiroyuki wrote:
> On Wed, 26 Mar 2008 22:57:57 -0700
> Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>   
>> Ah, I see what it is.  I wasn't trying to add enough memory.  It adds in 
>> units of SECTION_SIZE_BITS, which is 2^30 on 32-bit PAE.  When I 
>> increase the initial balloon extension to PAGES_PER_SECTION pages, I 
>> make some more progress:
>>
>> xen_balloon: Initialising balloon driver.
>> trying to reserve 262144 pages (1073741824 bytes) for balloon
>> bootmem alloc of 147456 bytes failed!
>> Kernel panic - not syncing: Out of memory
>> Pid: 1, comm: swapper Not tainted 2.6.25-rc7-x86-latest.git-dirty #361
>>  [<c01299dc>] panic+0x49/0x102
>>  [<c0647c3c>] __alloc_bootmem+0x24/0x29
>>  [<c0647c6d>] __alloc_bootmem_node+0x2c/0x34
>>  [<c044bd97>] zone_wait_table_init+0x45/0x95
>>  [<c0467258>] init_currently_empty_zone+0x1d/0xaa
>>  [<c01738ea>] __add_pages+0x88/0xdb
>>  [<c011c1a5>] arch_add_memory+0x25/0x2b
>>  [<c01737a9>] add_memory_resource+0x2f/0x36
>>  [<c064e487>] balloon_init+0x1b8/0x2b9
>>  [<c0635495>] kernel_init+0x137/0x292
>>  [<c063535e>] ? kernel_init+0x0/0x292
>>  [<c063535e>] ? kernel_init+0x0/0x292
>>  [<c0108b67>] kernel_thread_helper+0x7/0x10
>>  =======================
>>
>>
>> What's the rationale for setting SECTION_SIZE_BITS to 30?  Seems like a 
>> fairly large chunk.
>>
>>     
> At first, I believe usual DIMM size is bigger than SECTION_SIZE_BITS. This is
> designed for hardware-based hotplug.
>
> If you want to use memory-hotplug for virtualized enviroment, it's good to make
> this to be smaller chunk. Powerpc/IBM lpar uses 16MB chunk.
>
> It's a trade-off between section mainainance cost v.s. size of plugged memory.
> please find the best.

Hm, I tried reducing it to 2^28 (=256M), but I get a compilation failure:

  CC      arch/x86/kernel/asm-offsets.s
In file included from /home/jeremy/hg/xen/paravirt/linux/include/linux/suspend.h:11,
                 from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/asm-offsets_32.c:11,
                 from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/asm-offsets.c:2:
/home/jeremy/hg/xen/paravirt/linux/include/linux/mm.h:458:2: error: #error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > FLAGS_RESERVED
make[3]: *** [arch/x86/kernel/asm-offsets.s] Error 1


2^29 works.

    J


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-27  0:26 ` Dave Hansen
@ 2008-03-27 22:23   ` Jeremy Fitzhardinge
  2008-03-28 18:21     ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-27 22:23 UTC (permalink / raw)
  To: Dave Hansen
  Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Christoph Lameter,
	Linux Kernel Mailing List, Anthony Liguori, Chris Wright

Dave Hansen wrote:
> The flags being all null looks highly suspicious to me.
>
> Once you've done an add_memory(), the new sections should show up
> in /sys.  Do you see them in there?
>
> Once they show up, you can online them with:
>
> 	echo online > /sys/devices/system/memory/memoryXXX/state
>
> That's what actually goes and mucks with the 'struct zone's and the
> pgdats to expand them.  It will also call online_page() on the whole
> range.  I think you're trying to do this manually, and missing part of
> it.  

Hm, actually this is precisely the wrong thing to do in this case.  When 
the balloon driver adds a new section of hotplug memory, its doing it to 
get the page structures, but there's no actual memory backing those 
pages.  The memory only comes into existence on a page-by-page basis 
when the balloon driver gets memory from the hypervisor and attaches it 
to each page (the balloon driver uses online_page() on each page as its 
ready).

If the user does a mass online via /sys the system explodes because it 
onlines a large number of pages which have no backing memory.  Since 
none of those pages can be mapped, the kernel explodes in a variety of 
interesting ways.

So I'd really like to inhibit the sysfs interface on these sections.  
Thoughts?

Thanks,
    J

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-27 20:54       ` Jeremy Fitzhardinge
@ 2008-03-28  0:20         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-03-28  0:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Yasunori Goto, Christoph Lameter, Linux Kernel Mailing List,
	Anthony Liguori, Chris Wright

On Thu, 27 Mar 2008 13:54:52 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Wed, 26 Mar 2008 22:57:57 -0700
> > Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> >
> >   
> >> Ah, I see what it is.  I wasn't trying to add enough memory.  It adds in 
> >> units of SECTION_SIZE_BITS, which is 2^30 on 32-bit PAE.  When I 
> >> increase the initial balloon extension to PAGES_PER_SECTION pages, I 
> >> make some more progress:
> >>
> >> xen_balloon: Initialising balloon driver.
> >> trying to reserve 262144 pages (1073741824 bytes) for balloon
> >> bootmem alloc of 147456 bytes failed!
> >> Kernel panic - not syncing: Out of memory
> >> Pid: 1, comm: swapper Not tainted 2.6.25-rc7-x86-latest.git-dirty #361
> >>  [<c01299dc>] panic+0x49/0x102
> >>  [<c0647c3c>] __alloc_bootmem+0x24/0x29
> >>  [<c0647c6d>] __alloc_bootmem_node+0x2c/0x34
> >>  [<c044bd97>] zone_wait_table_init+0x45/0x95
> >>  [<c0467258>] init_currently_empty_zone+0x1d/0xaa
> >>  [<c01738ea>] __add_pages+0x88/0xdb
> >>  [<c011c1a5>] arch_add_memory+0x25/0x2b
> >>  [<c01737a9>] add_memory_resource+0x2f/0x36
> >>  [<c064e487>] balloon_init+0x1b8/0x2b9
> >>  [<c0635495>] kernel_init+0x137/0x292
> >>  [<c063535e>] ? kernel_init+0x0/0x292
> >>  [<c063535e>] ? kernel_init+0x0/0x292
> >>  [<c0108b67>] kernel_thread_helper+0x7/0x10
> >>  =======================
> >>
> >>
> >> What's the rationale for setting SECTION_SIZE_BITS to 30?  Seems like a 
> >> fairly large chunk.
> >>
> >>     
> > At first, I believe usual DIMM size is bigger than SECTION_SIZE_BITS. This is
> > designed for hardware-based hotplug.
> >
> > If you want to use memory-hotplug for virtualized enviroment, it's good to make
> > this to be smaller chunk. Powerpc/IBM lpar uses 16MB chunk.
> >
> > It's a trade-off between section mainainance cost v.s. size of plugged memory.
> > please find the best.
> 
> Hm, I tried reducing it to 2^28 (=256M), but I get a compilation failure:
> 
>   CC      arch/x86/kernel/asm-offsets.s
> In file included from /home/jeremy/hg/xen/paravirt/linux/include/linux/suspend.h:11,
>                  from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/asm-offsets_32.c:11,
>                  from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/asm-offsets.c:2:
> /home/jeremy/hg/xen/paravirt/linux/include/linux/mm.h:458:2: error: #error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > FLAGS_RESERVED
> make[3]: *** [arch/x86/kernel/asm-offsets.s] Error 1
> 
Ah, Now, section number of the page is encoded in page->flags.
(Sorry, I'm usually working on 64bit memory-hotplug...)
see mm.h
==
 371  * There are three possibilities for how page->flags get
 372  * laid out.  The first is for the normal case, without
 373  * sparsemem.  The second is for sparsemem when there is
 374  * plenty of space for node and section.  The last is when
 375  * we have run out of space and have to fall back to an
 376  * alternate (slower) way of determining the node.
 377  *
 378  *        No sparsemem: |       NODE     | ZONE | ... | FLAGS |
 379  * with space for node: | SECTION | NODE | ZONE | ... | FLAGS |
 380  *   no space for node: | SECTION |     ZONE    | ... | FLAGS |
==

Hmm, in other archs, sparsemem-vmemmap allows us to remove bits for section
(recent Christoph's work.) But for x86-32, kernel's NORMAL area seems to be
not enough to maintain vmemmap.

I have no good idea against this, now.

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Trying to make use of hotplug memory for xen balloon driver
  2008-03-27 22:23   ` Jeremy Fitzhardinge
@ 2008-03-28 18:21     ` Dave Hansen
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2008-03-28 18:21 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: KAMEZAWA Hiroyuki, Yasunori Goto, Christoph Lameter,
	Linux Kernel Mailing List, Anthony Liguori, Chris Wright

On Thu, 2008-03-27 at 15:23 -0700, Jeremy Fitzhardinge wrote:
> If the user does a mass online via /sys the system explodes because it
> onlines a large number of pages which have no backing memory.  Since 
> none of those pages can be mapped, the kernel explodes in a variety of
> interesting ways.

Yeah, it does look like you need some kind of partial onlining.

> So I'd really like to inhibit the sysfs interface on these sections.  
> Thoughts?

The balloon driver isn't an exact fit for memory hotplug as it stands,
so there are going to be a few growing pains here. :)

I'm not sure just inhibiting sysfs is the best thing.  What would you
think about adding partial sections, initializing the 'struct page's,
but just not touching the memory?

-- Dave


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-03-28 18:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-26 23:11 Trying to make use of hotplug memory for xen balloon driver Jeremy Fitzhardinge
2008-03-27  0:09 ` Dave Hansen
2008-03-27  0:15   ` Jeremy Fitzhardinge
2008-03-27  1:23   ` Christoph Lameter
2008-03-27  0:26 ` Dave Hansen
2008-03-27 22:23   ` Jeremy Fitzhardinge
2008-03-28 18:21     ` Dave Hansen
2008-03-27  0:50 ` KAMEZAWA Hiroyuki
2008-03-27  5:57   ` Jeremy Fitzhardinge
2008-03-27  6:11     ` KAMEZAWA Hiroyuki
2008-03-27  6:09       ` Jeremy Fitzhardinge
2008-03-27 20:54       ` Jeremy Fitzhardinge
2008-03-28  0:20         ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox