PATCH: Hugepage support for Domains booting with 4KB pages

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* PATCH: Hugepage support for Domains booting with 4KB pages
@ 2011-03-20 22:34 Keshav Darak
  2011-03-22 16:49 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 7+ messages in thread
From: Keshav Darak @ 2011-03-20 22:34 UTC (permalink / raw)
  To: xen-devel; +Cc: jeremy, keir

[-- Attachment #1.1: Type: text/plain, Size: 1636 bytes --]

We have implemented hugepage support for guests in following manner

In
 our implementation we added a parameter hugepage_num which is specified
 in the config file of the DomU. It is the number of hugepages that the 
guest is guaranteed to receive whenever the kernel asks for hugepage by 
using its boot time parameter or reserving after booting (eg. Using echo
 XX > /proc/sys/vm/nr_hugepages). During creation of the domain we 
reserve MFN's for these hugepages and store them in the list. The 
listhead of this list is inside the domain structure with name 
"hugepage_list". When the domain is booting, at that time the memory 
seen by the kernel is allocated memory  less the amount required for hugepages. The function 
reserve_hugepage_range is called as a initcall. Before this function the
 xen_extra_mem_start points to this apparent end of the memory. In this 
function we reserve the PFN range for the hugepages which are going to 
be allocated by kernel by incrementing the xen_extra_mem_start. We 
maintain these PFNs as pages in "xen_hugepfn_list" in the kernel. 

Now
 before the kernel requests for hugepages, it makes a hypercall 
HYPERVISOR_memory_op  to get count of hugepages allocated to it and 
accordingly reserves the pfn range.
then whenever kernel requests for
 hugepages it again make hypercall HYPERVISOR_memory_op to get the 
preallocated hugepage and according makes the p2m mapping on both sides 
(xen as well as kernel side)

The approach can be better explained using the presentation attached.

--
Keshav Darak
Kaustubh Kabra
Ashwin Vasani 
Aditya Gadre

[-- Attachment #1.2: Type: text/html, Size: 1792 bytes --]

[-- Attachment #2: xen_patch_210311_0227.patch --]
[-- Type: application/x-download, Size: 18234 bytes --]

[-- Attachment #3: jeremy-kernel.patch --]
[-- Type: application/x-download, Size: 6731 bytes --]

[-- Attachment #4: our_hugepage_approach.ppt --]
[-- Type: application/vnd.ms-powerpoint, Size: 327168 bytes --]

[-- Attachment #5: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: Hugepage support for Domains booting with 4KB pages
@ 2011-03-21 21:01 Keshav Darak
  2011-03-21 21:31 ` Keir Fraser
  0 siblings, 1 reply; 7+ messages in thread
From: Keshav Darak @ 2011-03-21 21:01 UTC (permalink / raw)
  To: xen-devel; +Cc: jeremy, keir


[-- Attachment #1.1: Type: text/plain, Size: 2227 bytes --]

have corrected few mistakes in previously attached xen patch file.
Please review it.

--- On Sun, 3/20/11, Keshav Darak <keshav_darak@yahoo.com> wrote:

From: Keshav Darak <keshav_darak@yahoo.com>
Subject: [Xen-devel] PATCH: Hugepage support for Domains booting with 4KB pages
To: xen-devel@lists.xensource.com
Cc: jeremy@goop.org, keir@xen.org
Date: Sunday, March 20, 2011, 10:34 PM

We have implemented hugepage support for guests in following manner

In
 our implementation we added a parameter hugepage_num which is specified
 in the config file of the DomU. It is the number of hugepages that the 
guest is guaranteed to receive whenever the kernel asks for hugepage by 
using its boot time parameter or reserving after booting (eg. Using echo
 XX > /proc/sys/vm/nr_hugepages). During creation of the domain we 
reserve MFN's for these hugepages and store them in the list. The 
listhead of this list is inside the domain structure with name 
"hugepage_list". When the domain is booting, at that time the memory 
seen by the kernel is allocated memory  less the amount required for hugepages. The function 
reserve_hugepage_range is called as a initcall. Before this function the
 xen_extra_mem_start points to this apparent end of the memory. In this 
function we reserve the PFN range for the hugepages which are going to 
be allocated by kernel by incrementing the xen_extra_mem_start. We 
maintain these PFNs as pages in "xen_hugepfn_list" in the kernel. 

Now
 before the kernel requests for hugepages, it makes a hypercall 
HYPERVISOR_memory_op  to get count of hugepages allocated to it and 
accordingly reserves the pfn range.
then whenever kernel requests for
 hugepages it again make hypercall HYPERVISOR_memory_op to get the 
preallocated hugepage and according makes the p2m mapping on both sides 
(xen as well as kernel side)

The approach can be better explained using the presentation attached.

--
Keshav Darak
Kaustubh Kabra
Ashwin Vasani 
Aditya Gadre



      
-----Inline Attachment Follows-----

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel



      

[-- Attachment #1.2: Type: text/html, Size: 2928 bytes --]

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: xen.patch --]
[-- Type: text/x-patch; name="xen.patch", Size: 19814 bytes --]

diff -r 4e108cf56d07 tools/libxc/xc_dom.h
--- a/tools/libxc/xc_dom.h	Mon Dec 27 08:00:09 2010 +0000
+++ b/tools/libxc/xc_dom.h	Mon Mar 21 11:29:26 2011 +0530
@@ -113,6 +113,7 @@
     domid_t guest_domid;
     int8_t vhpt_size_log2; /* for IA64 */
     int8_t superpages;
+    int hugepage_num;
     int shadow_enabled;
 
     int xen_version;
diff -r 4e108cf56d07 tools/libxc/xc_dom_core.c
--- a/tools/libxc/xc_dom_core.c	Mon Dec 27 08:00:09 2010 +0000
+++ b/tools/libxc/xc_dom_core.c	Mon Mar 21 11:29:26 2011 +0530
@@ -699,6 +699,20 @@
 
     page_shift = XC_DOM_PAGE_SHIFT(dom);
     nr_pages = mem_mb << (20 - page_shift);
+    
+    //a2k2
+    if(dom->hugepage_num && dom->superpages!=1)
+        {
+
+            nr_pages-=dom->hugepage_num*512;
+
+    }
+    if(nr_pages<=0)
+        {
+            xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, "%s: Allocated memory less than required for hugepages",
+                     __FUNCTION__);
+        return -1;
+        }
 
     DOMPRINTF("%s: mem %d MB, pages 0x%" PRIpfn " pages, %dk each",
                __FUNCTION__, mem_mb, nr_pages, 1 << (page_shift-10));
diff -r 4e108cf56d07 tools/libxc/xc_dom_x86.c
--- a/tools/libxc/xc_dom_x86.c	Mon Dec 27 08:00:09 2010 +0000
+++ b/tools/libxc/xc_dom_x86.c	Mon Mar 21 11:29:26 2011 +0530
@@ -747,9 +747,18 @@
             for ( j = 0; j < SUPERPAGE_NR_PFNS; j++, pfn++ )
                 dom->p2m_host[pfn] = mfn + j;
         }
+
     }
     else
     {
+        /*a2k2 setting up hugepages pool for domain in xen from its mem_size not allocated as free pages to domain. */
+        if(dom->hugepage_num)
+        {
+            rc = xc_domain_populate_hugemap(
+                dom->xch, dom->guest_domid, dom->hugepage_num,
+                9, 0, &dom->p2m_host[0]);
+
+        }
         /* setup initial p2m */
         for ( pfn = 0; pfn < dom->total_pages; pfn++ )
             dom->p2m_host[pfn] = pfn;
diff -r 4e108cf56d07 tools/libxc/xc_domain.c
--- a/tools/libxc/xc_domain.c	Mon Dec 27 08:00:09 2010 +0000
+++ b/tools/libxc/xc_domain.c	Mon Mar 21 11:29:26 2011 +0530
@@ -729,6 +729,34 @@
     return do_memory_op(xch, XENMEM_add_to_physmap, &xatp, sizeof(xatp));
 }
 
+int xc_domain_populate_hugemap(xc_interface *xch,
+                               uint32_t domid,
+                               unsigned long nr_extents,
+                               unsigned int extent_order,
+                               unsigned int mem_flags,
+                               xen_pfn_t *extent_start)
+{
+    int err;
+    DECLARE_HYPERCALL_BOUNCE(extent_start, nr_extents * sizeof(*extent_start), XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+    struct xen_memory_reservation reservation = {
+        .nr_extents   = nr_extents,
+        .extent_order = extent_order,
+        .mem_flags    = mem_flags,
+        .domid        = domid
+    };
+
+    if ( xc_hypercall_bounce_pre(xch, extent_start) )
+    {
+        PERROR("Could not bounce memory for XENMEM_populate_physmap hypercall");
+        return -1;
+    }
+    set_xen_guest_handle(reservation.extent_start, extent_start);
+
+    err = do_memory_op(xch, XENMEM_populate_hugemap, &reservation, sizeof(reservation));
+
+    xc_hypercall_bounce_post(xch, extent_start);
+    return err;
+}
 int xc_domain_populate_physmap(xc_interface *xch,
                                uint32_t domid,
                                unsigned long nr_extents,
diff -r 4e108cf56d07 tools/libxc/xenctrl.h
--- a/tools/libxc/xenctrl.h	Mon Dec 27 08:00:09 2010 +0000
+++ b/tools/libxc/xenctrl.h	Mon Mar 21 11:29:26 2011 +0530
@@ -1006,6 +1006,13 @@
                                unsigned int mem_flags,
                                xen_pfn_t *extent_start);
 
+int xc_domain_populate_hugemap(xc_interface *xch,
+                               uint32_t domid,
+                               unsigned long nr_extents,
+                               unsigned int extent_order,
+                               unsigned int mem_flags,
+                               xen_pfn_t *extent_start);
+
 int xc_domain_populate_physmap_exact(xc_interface *xch,
                                      uint32_t domid,
                                      unsigned long nr_extents,
diff -r 4e108cf56d07 tools/python/xen/lowlevel/xc/xc.c
--- a/tools/python/xen/lowlevel/xc/xc.c	Mon Dec 27 08:00:09 2010 +0000
+++ b/tools/python/xen/lowlevel/xc/xc.c	Mon Mar 21 11:29:26 2011 +0530
@@ -455,6 +455,7 @@
     int store_evtchn, console_evtchn;
     int vhpt = 0;
     int superpages = 0;
+    int hugepage_num = 0;
     unsigned int mem_mb;
     unsigned long store_mfn = 0;
     unsigned long console_mfn = 0;
@@ -467,14 +468,14 @@
                                 "console_evtchn", "image",
                                 /* optional */
                                 "ramdisk", "cmdline", "flags",
-                                "features", "vhpt", "superpages", NULL };
-
-    if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iiiis|ssisii", kwd_list,
+                                "features", "vhpt", "superpages","hugepage_num", NULL };
+
+    if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iiiis|ssisiii", kwd_list,
                                       &domid, &store_evtchn, &mem_mb,
                                       &console_evtchn, &image,
                                       /* optional */
                                       &ramdisk, &cmdline, &flags,
-                                      &features, &vhpt, &superpages) )
+                                      &features, &vhpt, &superpages,&hugepage_num) )
         return NULL;
 
     xc_dom_loginit(self->xc_handle);
@@ -484,6 +485,7 @@
     /* for IA64 */
     dom->vhpt_size_log2 = vhpt;
 
+  dom->hugepage_num=hugepage_num;
     dom->superpages = superpages;
 
     if ( xc_dom_linux_build(self->xc_handle, dom, domid, mem_mb, image,
diff -r 4e108cf56d07 tools/python/xen/xend/XendConfig.py
--- a/tools/python/xen/xend/XendConfig.py	Mon Dec 27 08:00:09 2010 +0000
+++ b/tools/python/xen/xend/XendConfig.py	Mon Mar 21 11:29:26 2011 +0530
@@ -244,6 +244,7 @@
     'memory_sharing': int,
     'pool_name' : str,
     'Description': str,
+    'hugepage_num':int,
 }
 
 # List of legacy configuration keys that have no equivalent in the
@@ -423,6 +424,7 @@
             'pool_name' : 'Pool-0',
             'superpages': 0,
             'description': '',
+            'hugepage_num':0,
         }
         
         return defaults
@@ -2135,6 +2137,8 @@
             image.append(['args', self['PV_args']])
         if self.has_key('superpages'):
             image.append(['superpages', self['superpages']])
+	if self.has_key('hugepage_num'):
+            image.append(['hugepage_num', self['hugepage_num']])
 
         for key in XENAPI_PLATFORM_CFG_TYPES.keys():
             if key in self['platform']:
@@ -2179,6 +2183,9 @@
         val = sxp.child_value(image_sxp, 'superpages')
         if val is not None:
             self['superpages'] = val
+        val = sxp.child_value(image_sxp, 'hugepage_num')
+        if val is not None:
+            self['hugepage_num'] = val
         
         val = sxp.child_value(image_sxp, 'memory_sharing')
         if val is not None:
diff -r 4e108cf56d07 tools/python/xen/xend/image.py
--- a/tools/python/xen/xend/image.py	Mon Dec 27 08:00:09 2010 +0000
+++ b/tools/python/xen/xend/image.py	Mon Mar 21 11:29:26 2011 +0530
@@ -84,6 +84,7 @@
 
     ostype = None
     superpages = 0
+    hugepage_num = 0
     memory_sharing = 0
 
     def __init__(self, vm, vmConfig):
@@ -711,6 +712,7 @@
         self.vramsize = int(vmConfig['platform'].get('videoram',4)) * 1024
         self.is_stubdom = (self.kernel.find('stubdom') >= 0)
         self.superpages = int(vmConfig['superpages'])
+	self.hugepage_num = int(vmConfig['hugepage_num'])
 
     def buildDomain(self):
         store_evtchn = self.vm.getStorePort()
@@ -729,6 +731,7 @@
         log.debug("features       = %s", self.vm.getFeatures())
         log.debug("flags          = %d", self.flags)
         log.debug("superpages     = %d", self.superpages)
+	log.debug("hugepage_num   = %d", self.hugepage_num)
         if arch.type == "ia64":
             log.debug("vhpt          = %d", self.vhpt)
 
@@ -742,7 +745,8 @@
                               features       = self.vm.getFeatures(),
                               flags          = self.flags,
                               vhpt           = self.vhpt,
-                              superpages     = self.superpages)
+                              superpages     = self.superpages,
+			      hugepage_num   = self.hugepage_num)
 
     def getBitSize(self):
         return xc.getBitSize(image    = self.kernel,
diff -r 4e108cf56d07 tools/python/xen/xm/create.dtd
--- a/tools/python/xen/xm/create.dtd	Mon Dec 27 08:00:09 2010 +0000
+++ b/tools/python/xen/xm/create.dtd	Mon Mar 21 11:29:26 2011 +0530
@@ -56,6 +56,7 @@
                  actions_after_crash    %CRASH_BEHAVIOUR; #REQUIRED
                  PCI_bus                CDATA #REQUIRED
                  superpages             CDATA #REQUIRED
+		 hugepage_num		CDATA #REQUIRED
                  security_label         CDATA #IMPLIED>
 
 <!ELEMENT memory EMPTY> 
diff -r 4e108cf56d07 tools/python/xen/xm/create.py
--- a/tools/python/xen/xm/create.py	Mon Dec 27 08:00:09 2010 +0000
+++ b/tools/python/xen/xm/create.py	Mon Mar 21 11:29:26 2011 +0530
@@ -680,6 +680,11 @@
            fn=set_int, default=0,
            use="Create domain with superpages")
 
+gopts.var('hugepage_num', val='NUM',
+           fn=set_int, default=0,
+           use="Domain with hugepages support")
+
+
 def err(msg):
     """Print an error to stderr and exit.
     """
@@ -770,6 +775,9 @@
         config_image.append(['args', vals.extra])
     if vals.superpages:
         config_image.append(['superpages', vals.superpages])
+    if vals.hugepage_num:
+        config_image.append(['hugepage_num', vals.hugepage_num])
+
 
     if vals.builder == 'hvm':
         configure_hvm(config_image, vals) 
diff -r 4e108cf56d07 tools/python/xen/xm/xenapi_create.py
--- a/tools/python/xen/xm/xenapi_create.py	Mon Dec 27 08:00:09 2010 +0000
+++ b/tools/python/xen/xm/xenapi_create.py	Mon Mar 21 11:29:26 2011 +0530
@@ -285,6 +285,8 @@
                 vm.attributes["s3_integrity"].value,
             "superpages":
                 vm.attributes["superpages"].value,
+ 	    "hugepage_num":
+                vm.attributes["hugepage_num"].value,
             "memory_static_max":
                 get_child_node_attribute(vm, "memory", "static_max"),
             "memory_static_min":
@@ -697,6 +699,8 @@
             = str(get_child_by_name(config, "s3_integrity", 0))
         vm.attributes["superpages"] \
             = str(get_child_by_name(config, "superpages", 0))
+	vm.attributes["hugepage_num"] \
+            = str(get_child_by_name(config, "hugepage_num", 0))
         vm.attributes["pool_name"] \
             = str(get_child_by_name(config, "pool_name", "Pool-0"))
 
diff -r 4e108cf56d07 xen/arch/x86/setup.c
--- a/xen/arch/x86/setup.c	Mon Dec 27 08:00:09 2010 +0000
+++ b/xen/arch/x86/setup.c	Mon Mar 21 11:29:26 2011 +0530
@@ -44,6 +44,7 @@
 #include <asm/mach-generic/mach_apic.h> /* for generic_apic_probe */
 #include <asm/setup.h>
 #include <xen/cpu.h>
+
 
 extern u16 boot_edid_caps;
 extern u8 boot_edid_info[128];
@@ -60,7 +61,10 @@
 /* opt_watchdog: If true, run a watchdog NMI on each processor. */
 static bool_t __initdata opt_watchdog;
 boolean_param("watchdog", opt_watchdog);
-
+//a2k2:
+static unsigned int __initdata dom0_hugepages=0;
+integer_param("dom0_hugepages", dom0_hugepages);
+int allocate_hugepages(struct domain *,int,int);
 /* **** Linux config option: propagated to domain0. */
 /* "acpi=off":    Sisables both ACPI table parsing and interpreter. */
 /* "acpi=force":  Override the disable blacklist.                   */
@@ -1259,7 +1263,10 @@
     
     if ( !tboot_protect_mem_regions() )
         panic("Could not protect TXT memory regions\n");
+    //a2k2:
 
+    //init_hugepages_pool();
+    printk("a2k2:Dom0 hugepages are :%u\n",dom0_hugepages);
     /* Create initial domain 0. */
     dom0 = domain_create(0, DOMCRF_s3_integrity, DOM0_SSIDREF);
     if ( (dom0 == NULL) || (alloc_dom0_vcpu0() == NULL) )
@@ -1267,7 +1274,7 @@
 
     dom0->is_privileged = 1;
     dom0->target = NULL;
-
+    allocate_hugepages(dom0,dom0_hugepages,SUPERPAGE_ORDER);
     /* Grab the DOM0 command line. */
     cmdline = (char *)(mod[0].string ? __va(mod[0].string) : NULL);
     if ( (cmdline != NULL) || (kextra != NULL) )
diff -r 4e108cf56d07 xen/common/domain.c
--- a/xen/common/domain.c	Mon Dec 27 08:00:09 2010 +0000
+++ b/xen/common/domain.c	Mon Mar 21 11:29:26 2011 +0530
@@ -240,9 +240,9 @@
     spin_lock_init(&d->hypercall_deadlock_mutex);
     INIT_PAGE_LIST_HEAD(&d->page_list);
     INIT_PAGE_LIST_HEAD(&d->xenpage_list);
-
+    INIT_PAGE_LIST_HEAD(&d->hugepage_list);
     spin_lock_init(&d->node_affinity_lock);
-
+    d->hugepage_num=0;
     spin_lock_init(&d->shutdown_lock);
     d->shutdown_code = -1;
 
@@ -441,7 +441,7 @@
 int domain_kill(struct domain *d)
 {
     int rc = 0;
-
+    struct page_info* page;
     if ( d == current->domain )
         return -EINVAL;
 
@@ -451,6 +451,12 @@
     case DOMDYING_alive:
         domain_pause(d);
         d->is_dying = DOMDYING_dying;
+         while(!page_list_empty(&(d->hugepage_list)))
+            {
+          
+                page=page_list_remove_head(&(d->hugepage_list));
+                free_domheap_pages(page,SUPERPAGE_ORDER);
+            }
         spin_barrier(&d->domain_lock);
         evtchn_destroy(d);
         gnttab_release_mappings(d);
diff -r 4e108cf56d07 xen/common/memory.c
--- a/xen/common/memory.c	Mon Dec 27 08:00:09 2010 +0000
+++ b/xen/common/memory.c	Mon Mar 21 11:29:26 2011 +0530
@@ -21,6 +21,7 @@
 #include <xen/errno.h>
 #include <xen/tmem.h>
 #include <xen/tmem_xen.h>
+
 #include <asm/current.h>
 #include <asm/hardirq.h>
 #ifdef CONFIG_X86
@@ -89,8 +90,44 @@
  out:
     a->nr_done = i;
 }
+int allocate_hugepages(struct domain *d,int hugepage_num,int order){
+    int i=0;
+    struct page_info *page;
+    if(order!=SUPERPAGE_ORDER)
+    {
+        goto out_huge; 
+    }
+    for(i=0;i<hugepage_num;i++){
+        page = alloc_domheap_pages(NULL, order,0);
+        if(page==NULL){
+            printk("a2k2: couldn't allocate hugepages for the Domain %d \n",d->domain_id);
+            goto out_huge;
+        }
+        if ( d->domain_id ){
+            if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages)){
+                 if ( !opt_tmem || order != 0 || d->tot_pages != d->max_pages )
+                     gdprintk(XENLOG_INFO, "Over-allocation for domain %u: "
+                              "%u > %u\n", d->domain_id,
+                              d->tot_pages + (1 << order), d->max_pages);
+                 goto err;
+            }
 
-static void populate_physmap(struct memop_args *a)
+            if ( unlikely(d->tot_pages == 0) )
+                get_knownalive_domain(d);
+            d->tot_pages += 1 << order;
+         }
+         page_list_add(page,&(d->hugepage_list));    
+    }
+    goto out_huge;
+err:
+    free_domheap_pages(page,order);
+    out_huge:
+    d->hugepage_num+=i;
+    return i;
+}
+
+
+static void populate_physmap(struct memop_args *a,int flags)
 {
     struct page_info *page;
     unsigned long i, j;
@@ -123,7 +160,26 @@
         }
         else
         {
-            page = alloc_domheap_pages(d, a->extent_order, a->memflags);
+            if(flags){
+                //a2k2:
+                page=page_list_remove_head(&(d->hugepage_list));
+                if(page==NULL){
+                    // flags=0;
+                }
+                else
+                {
+                    if(d->domain_id)
+                        d->tot_pages-=1 << a->extent_order;
+                    if(assign_pages(d,page,a->extent_order,a->memflags)==-1){
+                        printk("a2k2: hugepage assignment to domain failed.\n");
+
+                        goto out;
+                     }
+                }
+            }
+            if(!flags)
+                page = alloc_domheap_pages(d, a->extent_order, a->memflags);
+           
             if ( unlikely(page == NULL) ) 
             {
                 if ( !opt_tmem || (a->extent_order != 0) )
@@ -511,9 +567,13 @@
 
     switch ( op )
     {
+    case XENMEM_hugepage_cnt:
     case XENMEM_increase_reservation:
     case XENMEM_decrease_reservation:
     case XENMEM_populate_physmap:
+    case XENMEM_populate_hugemap:
+    case XENMEM_populate_hugepage:
+    
         start_extent = cmd >> MEMOP_EXTENT_SHIFT;
 
         if ( copy_from_guest(&reservation, arg, 1) )
@@ -581,8 +641,17 @@
         case XENMEM_decrease_reservation:
             decrease_reservation(&args);
             break;
+        case XENMEM_populate_hugepage:
+            populate_physmap(&args,1);
+            break;
+        case XENMEM_populate_hugemap:
+            args.nr_done=allocate_hugepages(args.domain,args.nr_extents,args.extent_order);
+            break;
+        case XENMEM_hugepage_cnt:
+            args.nr_done=d->hugepage_num;
+            break;
         default: /* XENMEM_populate_physmap */
-            populate_physmap(&args);
+            populate_physmap(&args,0);
             break;
         }
 
@@ -596,7 +665,7 @@
                 op | (rc << MEMOP_EXTENT_SHIFT), arg);
 
         break;
-
+    
     case XENMEM_exchange:
         rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t));
         break;
diff -r 4e108cf56d07 xen/include/public/memory.h
--- a/xen/include/public/memory.h	Mon Dec 27 08:00:09 2010 +0000
+++ b/xen/include/public/memory.h	Mon Mar 21 11:29:26 2011 +0530
@@ -37,6 +37,9 @@
 #define XENMEM_increase_reservation 0
 #define XENMEM_decrease_reservation 1
 #define XENMEM_populate_physmap     6
+#define XENMEM_populate_hugepage    19
+#define XENMEM_hugepage_cnt         20
+#define XENMEM_populate_hugemap     21
 
 #if __XEN_INTERFACE_VERSION__ >= 0x00030209
 /*
diff -r 4e108cf56d07 xen/include/xen/mm.h
--- a/xen/include/xen/mm.h	Mon Dec 27 08:00:09 2010 +0000
+++ b/xen/include/xen/mm.h	Mon Mar 21 11:29:26 2011 +0530
@@ -96,14 +96,19 @@
 #endif
 
 #define page_list_entry list_head
+#define hugepage_list_entry list_head
 
 #include <asm/mm.h>
 
 #ifndef page_list_entry
 struct page_list_head
 {
-    struct page_info *next, *tail;
+    struct page_info *next, *tail; 
 };
+/*struct hugepage_list_head
+{
+  struct hugepage_info *next,*tail;
+  };*/
 /* These must only have instances in struct page_info. */
 # define page_list_entry
 
@@ -326,5 +331,15 @@
 #define RAM_TYPE_ACPI         0x00000008
 /* TRUE if the whole page at @mfn is of the requested RAM type(s) above. */
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type);
+/*
+#define hugepage_list_head list_head
 
+//a2k2
+struct hugepage_info
+{
+  mfn_t mfn;
+  hugepage_list_entry hugepage_list;
+
+
+  };*/
 #endif /* __XEN_MM_H__ */
diff -r 4e108cf56d07 xen/include/xen/sched.h
--- a/xen/include/xen/sched.h	Mon Dec 27 08:00:09 2010 +0000
+++ b/xen/include/xen/sched.h	Mon Mar 21 11:29:26 2011 +0530
@@ -211,6 +211,9 @@
     spinlock_t       page_alloc_lock; /* protects all the following fields  */
     struct page_list_head page_list;  /* linked list, of size tot_pages     */
     struct page_list_head xenpage_list; /* linked list (size xenheap_pages) */
+    struct page_list_head hugepage_list; /*a2k2:Free hugepage list*/
+    unsigned int     hugepage_num;    /*a2k2:  */
+
     unsigned int     tot_pages;       /* number of pages currently possesed */
     unsigned int     max_pages;       /* maximum value for tot_pages        */
     atomic_t         shr_pages;       /* number of shared pages             */

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: Hugepage support for Domains booting with 4KB pages
  2011-03-21 21:01 PATCH: Hugepage support for Domains booting with 4KB pages Keshav Darak
@ 2011-03-21 21:31 ` Keir Fraser
  2011-03-22 12:36   ` Keshav Darak
  0 siblings, 1 reply; 7+ messages in thread
From: Keir Fraser @ 2011-03-21 21:31 UTC (permalink / raw)
  To: Keshav Darak, xen-devel; +Cc: jeremy

Keshav,

There is already optional support for superpage allocations and mappings for
PV guests in the hypervisor and toolstack. See the opt_allow_superpages
boolean flag in the hypervisor, and the 'superpages' domain config option
that can be specified when creating a new domain via xend/xm.

 -- Keir

On 21/03/2011 21:01, "Keshav Darak" <keshav_darak@yahoo.com> wrote:

> have corrected few mistakes in previously attached xen patch file.
> Please review it.
> 
> --- On Sun, 3/20/11, Keshav Darak <keshav_darak@yahoo.com> wrote:
>> 
>> From: Keshav Darak <keshav_darak@yahoo.com>
>> Subject: [Xen-devel] PATCH: Hugepage support for Domains booting with 4KB
>> pages
>> To: xen-devel@lists.xensource.com
>> Cc: jeremy@goop.org, keir@xen.org
>> Date: Sunday, March 20, 2011, 10:34 PM
>> 
>> We have implemented hugepage support for guests in following manner
>> 
>> In our implementation we added a parameter hugepage_num which is specified in
>> the config file of the DomU. It is the number of hugepages that the guest is
>> guaranteed to receive whenever the kernel asks for hugepage by using its boot
>> time parameter or reserving after booting (eg. Using echo XX >
>> /proc/sys/vm/nr_hugepages). During creation of the domain we reserve MFN's
>> for these hugepages and store them in the list. The listhead of this list is
>> inside the domain structure with name "hugepage_list". When the domain is
>> booting, at that time the memory seen by the kernel is allocated memory  less
>> the amount required for hugepages. The function reserve_hugepage_range is
>> called as a initcall. Before this function the xen_extra_mem_start points to
>> this apparent end of the memory. In this function we reserve the PFN range
>> for the hugepages which are going to be allocated by kernel by incrementing
>> the xen_extra_mem_start. We maintain these PFNs as pages in
>> "xen_hugepfn_list" in the kernel.
>> 
>> Now before the kernel requests for hugepages, it makes a hypercall
>> HYPERVISOR_memory_op  to get count of hugepages allocated to it and
>> accordingly reserves the pfn range.
>> then whenever kernel requests for hugepages it again make hypercall
>> HYPERVISOR_memory_op to get the preallocated hugepage and according makes the
>> p2m mapping on both sides (xen as well as kernel side)
>> 
>> The approach can be better explained using the presentation attached.
>> 
>> --
>> Keshav Darak
>> Kaustubh Kabra
>> Ashwin Vasani 
>> Aditya Gadre
>> 
>>  
>> 
>> -----Inline Attachment Follows-----
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com </mc/compose?to=Xen-devel@lists.xensource.com>
>> http://lists.xensource.com/xen-devel
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: Hugepage support for Domains booting with 4KB pages
  2011-03-21 21:31 ` Keir Fraser
@ 2011-03-22 12:36   ` Keshav Darak
  2011-03-22 14:07     ` Keir Fraser
  0 siblings, 1 reply; 7+ messages in thread
From: Keshav Darak @ 2011-03-22 12:36 UTC (permalink / raw)
  To: xen-devel, Keir Fraser; +Cc: jeremy


[-- Attachment #1.1: Type: text/plain, Size: 3866 bytes --]

Keir,
    We are aware of it and we have to use 'opt_allow_superpages' boolean flag in our implementation too. But when we use superpages flag in domain configuration file,
entire domain boots on hugepages (superpages).If the specified memory in 'hugepages' for the domain is not available, then the domain does not boot. 
     But in our implementation , we target to give only those many hugepages ( using "hugepage_num" option in config file) to the domain that it actually requires and hence entire domain need not be booted on hugepages.
     This is to support domains that boot with 4 KB pages and still can use hugepages. So,
the pressure on the number of hugepages required for a domain even to boot is reduced to a great extent. 
--- On Mon, 3/21/11, Keir Fraser <keir.xen@gmail.com> wrote:

From: Keir Fraser <keir.xen@gmail.com>
Subject: Re: [Xen-devel] PATCH: Hugepage support for Domains booting with 4KB pages
To: "Keshav Darak" <keshav_darak@yahoo.com>, xen-devel@lists.xensource.com
Cc: jeremy@goop.org
Date: Monday, March 21, 2011, 9:31 PM

Keshav,

There is already optional support for superpage allocations and mappings for
PV guests in the hypervisor and toolstack. See the opt_allow_superpages
boolean flag in the hypervisor, and the 'superpages' domain config option
that can be specified when creating a new domain via xend/xm.

 -- Keir

On 21/03/2011 21:01, "Keshav Darak" <keshav_darak@yahoo.com> wrote:

> have corrected few mistakes in previously attached xen patch file.
> Please review it.
> 
> --- On Sun, 3/20/11, Keshav Darak <keshav_darak@yahoo.com> wrote:
>> 
>> From: Keshav Darak <keshav_darak@yahoo.com>
>> Subject: [Xen-devel] PATCH: Hugepage support for Domains booting with 4KB
>> pages
>> To: xen-devel@lists.xensource.com
>> Cc: jeremy@goop.org, keir@xen.org
>> Date: Sunday, March 20, 2011, 10:34 PM
>> 
>> We have implemented hugepage support for guests in following manner
>> 
>> In our implementation we added a parameter hugepage_num which is specified in
>> the config file of the DomU. It is the number of hugepages that the guest is
>> guaranteed to receive whenever the kernel asks for hugepage by using its boot
>> time parameter or reserving after booting (eg. Using echo XX >
>> /proc/sys/vm/nr_hugepages). During creation of the domain we reserve MFN's
>> for these hugepages and store them in the list. The listhead of this list is
>> inside the domain structure with name "hugepage_list". When the domain is
>> booting, at that time the memory seen by the kernel is allocated memory  less
>> the amount required for hugepages. The function reserve_hugepage_range is
>> called as a initcall. Before this function the xen_extra_mem_start points to
>> this apparent end of the memory. In this function we reserve the PFN range
>> for the hugepages which are going to be allocated by kernel by incrementing
>> the xen_extra_mem_start. We maintain these PFNs as pages in
>> "xen_hugepfn_list" in the kernel.
>> 
>> Now before the kernel requests for hugepages, it makes a hypercall
>> HYPERVISOR_memory_op  to get count of hugepages allocated to it and
>> accordingly reserves the pfn range.
>> then whenever kernel requests for hugepages it again make hypercall
>> HYPERVISOR_memory_op to get the preallocated hugepage and according makes the
>> p2m mapping on both sides (xen as well as kernel side)
>> 
>> The approach can be better explained using the presentation attached.
>> 
>> --
>> Keshav Darak
>> Kaustubh Kabra
>> Ashwin Vasani 
>> Aditya Gadre
>> 
>>  
>> 
>> -----Inline Attachment Follows-----
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com </mc/compose?to=Xen-devel@lists.xensource.com>
>> http://lists.xensource.com/xen-devel
> 
> 





      

[-- Attachment #1.2: Type: text/html, Size: 5539 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: Hugepage support for Domains booting with 4KB pages
  2011-03-22 12:36   ` Keshav Darak
@ 2011-03-22 14:07     ` Keir Fraser
  0 siblings, 0 replies; 7+ messages in thread
From: Keir Fraser @ 2011-03-22 14:07 UTC (permalink / raw)
  To: Keshav Darak, xen-devel; +Cc: jeremy

On 22/03/2011 12:36, "Keshav Darak" <keshav_darak@yahoo.com> wrote:

> Keir,
>     We are aware of it and we have to use 'opt_allow_superpages' boolean flag
> in our implementation too. But when we use superpages flag in domain
> configuration file,
> entire domain boots on hugepages (superpages).If the specified memory in
> 'hugepages' for the domain is not available, then the domain does not boot.
>      But in our implementation , we target to give only those many hugepages (
> using "hugepage_num" option in config file) to the domain that it actually
> requires and hence entire domain need not be booted on hugepages.
>      This is to support domains that boot with 4 KB pages and still can use
> hugepages. So,
> the pressure on the number of hugepages required for a domain even to boot is
> reduced to a great extent.

Okay, I don't see why that would need further changes in the hypervisor
itself, however.

 -- Keir

> --- On Mon, 3/21/11, Keir Fraser <keir.xen@gmail.com> wrote:
>> 
>> From: Keir Fraser <keir.xen@gmail.com>
>> Subject: Re: [Xen-devel] PATCH: Hugepage support for Domains booting with 4KB
>> pages
>> To: "Keshav Darak" <keshav_darak@yahoo.com>, xen-devel@lists.xensource.com
>> Cc: jeremy@goop.org
>> Date: Monday, March 21, 2011, 9:31 PM
>> 
>> Keshav,
>> 
>> There is already optional support for superpage allocations and mappings for
>> PV guests in the hypervisor and toolstack. See the opt_allow_superpages
>> boolean flag in the hypervisor, and the 'superpages' domain config option
>> that can be specified when creating a new domain via xend/xm.
>> 
>>  -- Keir
>> 
>> On 21/03/2011 21:01, "Keshav Darak" <keshav_darak@yahoo.com
>> </mc/compose?to=keshav_darak@yahoo.com> > wrote:
>> 
>>> have corrected few mistakes in previously attached xen patch file.
>>> Please review it.
>>> 
>>> --- On Sun, 3/20/11, Keshav Darak <keshav_darak@yahoo.com
>>> </mc/compose?to=keshav_darak@yahoo.com> > wrote:
>>>> 
>>>> From: Keshav Darak <keshav_darak@yahoo.com
>>>> </mc/compose?to=keshav_darak@yahoo.com> >
>>>> Subject: [Xen-devel] PATCH: Hugepage support for Domains booting with 4KB
>>>> pages
>>>> To: xen-devel@lists.xensource.com
>>>> </mc/compose?to=xen-devel@lists.xensource.com>
>>>> Cc: jeremy@goop.org </mc/compose?to=jeremy@goop.org> , keir@xen.org
>>>> </mc/compose?to=keir@xen.org>
>>>> Date: Sunday, March 20, 2011, 10:34 PM
>>>> 
>>>> We have implemented hugepage support for guests in following manner
>>>> 
>>>> In our implementation we added a parameter hugepage_num which is specified
>>>> in
>>>> the config file of the DomU. It is the number of hugepages that the guest
>>>> is
>>>> guaranteed to receive whenever the kernel asks for hugepage by using its
>>>> boot
>>>> time parameter or reserving after booting (eg. Using echo XX >
>>>> /proc/sys/vm/nr_hugepages). During creation of the domain we reserve MFN's
>>>> for these hugepages and store them in the list. The listhead of this list
>>>> is
>>>> inside the domain structure with name "hugepage_list". When the domain is
>>>> booting, at that time the memory seen by the kernel is allocated memory
>>>> less
>>>> the amount required for hugepages. The function reserve_hugepage_range is
>>>> called as a initcall. Before this function the xen_extra_mem_start points
>>>> to
>>>> this apparent end of the memory. In this function we reserve the PFN range
>>>> for the hugepages which are going to be allocated by kernel by incrementing
>>>> the xen_extra_mem_start. We maintain these PFNs as pages in
>>>> "xen_hugepfn_list" in the kernel.
>>>> 
>>>> Now before the kernel requests for hugepages, it makes a hypercall
>>>> HYPERVISOR_memory_op  to get count of hugepages allocated to it and
>>>> accordingly reserves the pfn range.
>>>> then whenever kernel requests for hugepages it again make hypercall
>>>> HYPERVISOR_memory_op to get the preallocated hugepage and according makes
>>>> the
>>>> p2m mapping on both sides (xen as well as kernel side)
>>>> 
>>>> The approach can be better explained using the presentation attached.
>>>> 
>>>> --
>>>> Keshav Darak
>>>> Kaustubh Kabra
>>>> Ashwin Vasani 
>>>> Aditya Gadre
>>>> 
>>>>  
>>>> 
>>>> -----Inline Attachment Follows-----
>>>> 
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> </mc/compose?to=Xen-devel@lists.xensource.com>
>>>> </mc/compose?to=Xen-devel@lists.xensource.com
>>>> </mc/compose?to=Xen-devel@lists.xensource.com> >
>>>> http://lists.xensource.com/xen-devel
>>> 
>>> 
>> 
>> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: Hugepage support for Domains booting with 4KB pages
  2011-03-20 22:34 Keshav Darak
@ 2011-03-22 16:49 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 7+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-03-22 16:49 UTC (permalink / raw)
  To: Keshav Darak; +Cc: jeremy, xen-devel, keir

On Sun, Mar 20, 2011 at 03:34:51PM -0700, Keshav Darak wrote:
> We have implemented hugepage support for guests in following manner
> 
> In
>  our implementation we added a parameter hugepage_num which is specified
>  in the config file of the DomU. It is the number of hugepages that the 
> guest is guaranteed to receive whenever the kernel asks for hugepage by 
> using its boot time parameter or reserving after booting (eg. Using echo
>  XX > /proc/sys/vm/nr_hugepages). During creation of the domain we 
> reserve MFN's for these hugepages and store them in the list. The 

There is bootup option for normal Linux kernels to set that up. Was
that something you could use?


In regards to the patch, I've some questions..

>diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
>index bf88684..7707e21 100644
>--- a/arch/x86/include/asm/hugetlb.h
>+++ b/arch/x86/include/asm/hugetlb.h
>@@ -98,6 +98,14 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> 
> 	return changed;
> }
>+#ifdef CONFIG_XEN
>+struct page* allocate_hugepage(int);
>+#else
>+static inline struct page * allocate_hugepage(int order)
>+{
>+        return NULL;
>+}
>+#endif

So it looks like you are exposing the allocate_hugepage to be out
of the hotplug memory. Could you do this via a pvops structure instead?
You should also seperate this functionality as its own patch - ie,
expose the allocate_hugepage.
> 
> static inline int arch_prepare_hugepage(struct page *page)
> {
>index f46c340..00c489a 100644
>--- a/arch/x86/mm/hugetlbpage.c
>+++ b/arch/x86/mm/hugetlbpage.c
>@@ -147,8 +147,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
> 			pte = (pte_t *) pmd_alloc(mm, pud, addr);
> 		}
> 	}
>-	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
>-
>+	BUG_ON(pte && !pte_none(*pte) && !((*pte).pte & (_AT(pteval_t, 1)<<7)));

Ugh. That is horrible.

why can't you use 'pte_huge' ? Is it b/c of this
 * (We should never see kernel mappings with _PAGE_PSE set,
 * but we could see hugetlbfs mappings, I think.).
 */
if (pat_enabled && !WARN_ON(pte & _PAGE_PAT)) { in xen/mmu.c? 

If so, have you thought about removing the warnings and/or changing
the logic there instead?

> 	return pte;
> }
> 
>diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>index 070f138..c1db610 100644
>--- a/arch/x86/xen/enlighten.c
>+++ b/arch/x86/xen/enlighten.c
>@@ -58,6 +58,7 @@
> #include <asm/reboot.h>
> #include <asm/stackprotector.h>
> #include <asm/hypervisor.h>
>+#include <linux/list.h>
> 
> #include "xen-ops.h"
> #include "mmu.h"
>@@ -76,6 +77,9 @@ EXPORT_SYMBOL(machine_to_phys_mapping);
> unsigned int   machine_to_phys_order;
> EXPORT_SYMBOL(machine_to_phys_order);
> 
>+struct list_head xen_hugepfn_list;
>+EXPORT_SYMBOL_GPL(xen_hugepfn_list);
>+

Hmm, a list, but no locking? There is no need for a spinlock?

> struct start_info *xen_start_info;
> EXPORT_SYMBOL_GPL(xen_start_info);
> 
>diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
>index 32a1c65..fa9fa6c 100644
>--- a/arch/x86/xen/mmu.c
>+++ b/arch/x86/xen/mmu.c
>@@ -44,6 +44,7 @@
> #include <linux/bug.h>
> #include <linux/vmalloc.h>
> #include <linux/module.h>
>+#include <linux/bootmem.h>
> 
> #include <asm/pgtable.h>
> #include <asm/tlbflush.h>
>@@ -2776,5 +2784,79 @@ static int __init xen_mmu_debugfs(void)
> 	return 0;
> }
> fs_initcall(xen_mmu_debugfs);
>+//a2k2
>+extern struct list_head xen_hugepfn_list;
>+static unsigned long frame_list[PAGE_SIZE / sizeof(unsigned long)];
>+
>+static void scrub_page(struct page *page)
>+{
>+#ifdef CONFIG_XEN_SCRUB_PAGES
>+		clear_highpage(page);
>+#endif

You don't seem to use this anywhere..
>+}
>+#ifdef CONFIG_HIGHMEM
>+#define inc_totalhigh_pages() (totalhigh_pages++)
>+#define dec_totalhigh_pages() (totalhigh_pages--)

so what do you do with those? I seem incremented, but
nothing seems to use them.
>+#else
>+#define inc_totalhigh_pages() do {} while(0)
>+#define dec_totalhigh_pages() do {} while(0)
>+#endif
>+
>+struct page* allocate_hugepage(int order){
>+    unsigned long  mfn, i, j;

You seem to be definiing 'i' but not using it?

>+    struct page   *page;
>+    long           rc,cnt;

You should be using 'int' for rc.

>+    unsigned long pfn;
>+    struct xen_memory_reservation reservation = {
>+	.address_bits = 0,
>+	.domid        = DOMID_SELF
>+    };
>+    if(list_empty(&xen_hugepfn_list)){
>+        return NULL; 
>+    } 
>+    page = list_entry(xen_hugepfn_list.next, struct page, lru);
>+    list_del(&page->lru);
>+    frame_list[0] = page_to_pfn(page);
>+
>+    set_xen_guest_handle(reservation.extent_start, frame_list);
>+    reservation.nr_extents = 1;
>+    reservation.extent_order = 9;
shouldn't this 'order' instead of 9'?
>+    cnt=1<<order;
>+    rc = HYPERVISOR_memory_op(XENMEM_populate_hugepage, &reservation);
>+    if (rc <= 0)
>+    {
>+	printk("a2k2: could not allocate hugepage\n"); 

Please run 'script/checkpatch.pl' before posting. It will show that this a not good.
>+	goto out1;
>+    }
>+    pfn=page_to_pfn(page);
>+    if(rc)
>+    {
>+	mfn = frame_list[0];
>+	for (j = 0; j < cnt; j++, pfn++, mfn++) {
>+	    set_phys_to_machine(pfn, mfn);
>+	    if (pfn < max_low_pfn) {

... and what if 'pfn > max_low_pfn' ? What should we
do then?
>+	        int ret;
>+		ret = HYPERVISOR_update_va_mapping(
>+					       (unsigned long)__va(pfn << PAGE_SHIFT),

Use the PFN_DOWN macro please.
>+					       mfn_pte(mfn, PAGE_KERNEL),
>+					       0);
>+		BUG_ON(ret);

Ugh. what if you just stopped the allocation and returned NULL instead?

>+	    }
>+	}
>+
>+	ClearPageReserved(page);
>+	atomic_set(&page->_count,1);

Ugh.. Is that correct? Can you explain why you need to do that?

>+    }

What if rc is zero? Should we ocntinue on with page?
>+    if (PageHighMem(page)) {
>+        for(i=0;i<512;i++)

Shouldn't you determine the value 512 from the 'order'?
>+	    inc_totalhigh_pages();
>+    }
>+    totalram_pages+=512;

Ditto.
>+    __SetPageHead(page);
>+    set_compound_order(page,order);
>+    return page;
>+ out1:

shouldn't you return the page back on the xen_hugepfn_list?

>+    return NULL;
>+}
> 
> #endif	/* CONFIG_XEN_DEBUG_FS */
>diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
>index 1a1934a..c1781e7 100644
>--- a/arch/x86/xen/setup.c
>+++ b/arch/x86/xen/setup.c
>@@ -138,6 +138,40 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn,
> 	return released;
> }
> 
>+extern struct list_head xen_hugepfn_list;
>+static int __init reserve_hugepage_range(void)
>+{
>+    phys_addr_t temp,hugemem_start;
>+    unsigned long i,ret;

Is 'unsigned long' the right type for 'ret'?

>+    struct page *page;
>+    static unsigned long frame_list[1];
>+    struct xen_memory_reservation reservation = {
>+        .address_bits = 0,
>+	 .extent_order = 0,
>+	 .nr_extents   = 1,
>+	 .domid        = DOMID_SELF
>+    };
>+    if(!xen_pv_domain())
>+	 return -ENODEV;
>+    set_xen_guest_handle(reservation.extent_start, frame_list);
>+    hugemem_start=PFN_UP(xen_extra_mem_start);
>+    ret = HYPERVISOR_memory_op(20, &reservation);

20? There is no #define for this?
You are not checking to see if the hypercall failed.


>+    printk("a2k2: num of hugepages found =%lu\n",ret);
>+    temp=PFN_PHYS(PFN_UP(xen_extra_mem_start)+ret*512);
There has to be a better way of doing this.

You are assuming that the hugepage is always 2MB. What if it is 16MB?
Is there a way you can use any definitions of the architecture for this?

>+
>+    xen_extra_mem_start=temp;
>+    xen_extra_mem_size-=ret*2*1024*1024;

Ditto. You are assuming it is 2MB. I couldbe bigger (or smaller)
depending on the architecture.

>+    INIT_LIST_HEAD(&xen_hugepfn_list);
>+    for(i=0;i<ret;i++)
>+    {
>+        page=pfn_to_page(hugemem_start);
>+        list_add_tail(&page->lru,&xen_hugepfn_list);
>+        hugemem_start+=512;
>+    }
>+    return 0;
>+}
>+subsys_initcall(reserve_hugepage_range);
>+
> /**
>  * machine_specific_memory_setup - Hook for machine specific memory setup.
>  **/
>diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
>index aa4e368..168dd2b 100644
>--- a/include/xen/interface/memory.h
>+++ b/include/xen/interface/memory.h
>@@ -19,6 +19,7 @@
> #define XENMEM_increase_reservation 0
> #define XENMEM_decrease_reservation 1
> #define XENMEM_populate_physmap     6
>+#define XENMEM_populate_hugepage    19
> struct xen_memory_reservation {
> 
>     /*
>diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>index f5a106e..7e38f73 100644
>--- a/mm/hugetlb.c
>+++ b/mm/hugetlb.c
>@@ -18,6 +18,7 @@
> #include <linux/mutex.h>
> #include <linux/bootmem.h>
> #include <linux/sysfs.h>
>+#include <xen/xen.h>
> 
> #include <asm/page.h>
> #include <asm/pgtable.h>
>@@ -600,17 +620,19 @@ int PageHuge(struct page *page)
> 	return dtor == free_huge_page;
> }
> 
>+ 
> static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
> {
> 	struct page *page;
>-
> 	if (h->order >= MAX_ORDER)
> 		return NULL;
>-
>-	page = alloc_pages_exact_node(nid,
>-		htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|
>-						__GFP_REPEAT|__GFP_NOWARN,
>-		huge_page_order(h));
>+       if(!xen_pv_domain())

Ugh. That is not the right way. You should be looking at using the pvops
struct interface so that you can over-write the baremetal default
implementation.

>+	        page = alloc_pages_exact_node(nid,
>+		        htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|
>+		                                        __GFP_REPEAT|__GFP_NOWARN,
>+		        huge_page_order(h));
>+	else
>+	        page=allocate_hugepage(huge_page_order(h));
> 	if (page) {
> 		if (arch_prepare_hugepage(page)) {
> 			__free_pages(page, huge_page_order(h));
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: Hugepage support for Domains booting with 4KB pages
@ 2011-03-22 18:05 Keshav Darak
  0 siblings, 0 replies; 7+ messages in thread
From: Keshav Darak @ 2011-03-22 18:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


Konrad,

Thanks for reviewing the patch.

>> We have implemented hugepage support for guests in following manner
>> 
>> In
>>  our implementation we added a parameter hugepage_num which is specified
>>  in the config file of the DomU. It is the number of hugepages that the 
>> guest is guaranteed to receive whenever the kernel asks for hugepage by 
>> using its boot time parameter or reserving after booting (eg. Using echo
>>  XX > /proc/sys/vm/nr_hugepages). During creation of the domain we 
>> reserve MFN's for these hugepages and store them in the list. The 

>There is bootup option for normal Linux kernels to set that up. Was
>that something you could use?

ya, it can be used too, to allocate the hugepages.

>> 
>> static inline int arch_prepare_hugepage(struct page *page)
>> {
>>index f46c340..00c489a 100644
>>--- a/arch/x86/mm/hugetlbpage.c
>>+++ b/arch/x86/mm/hugetlbpage.c
>>@@ -147,8 +147,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>>             pte = (pte_t *) pmd_alloc(mm, pud, addr);
>>         }
>>     }
>>-    BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
>>-
>>+    BUG_ON(pte && !pte_none(*pte) && !((*pte).pte & (_AT(pteval_t, 1)<<7)));

>Ugh. That is horrible.

>why can't you use 'pte_huge' ? Is it b/c of this
> * (We should never see kernel mappings with _PAGE_PSE set,
> * but we could see hugetlbfs mappings, I think.).
> */

Here actually we don't known the exact reason, but when pte_huge was used the BUG_ON was called even though the PSE bit was set, so we had to rewrite the BUG_ON testing the 7th bit.there could be better ways to do it, But we couldn't find the exact reasons why was it(pte_huge) returning 0 even when the pte was a huge_pte.    


we will try to resolve other issues with the patch as soon as possible. 

--
Keshav Darak
Kaustubh Kabra
Ashwin Vasani
Aditya Gadre




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-03-22 18:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-21 21:01 PATCH: Hugepage support for Domains booting with 4KB pages Keshav Darak
2011-03-21 21:31 ` Keir Fraser
2011-03-22 12:36   ` Keshav Darak
2011-03-22 14:07     ` Keir Fraser
  -- strict thread matches above, loose matches on Subject: below --
2011-03-22 18:05 Keshav Darak
2011-03-20 22:34 Keshav Darak
2011-03-22 16:49 ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).