From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vasiliy G Tolstov Subject: Re: [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version Date: Thu, 12 Aug 2010 11:44:01 +0400 Message-ID: <1281599041.5454.1.camel@vase.work> References: <20100812012224.GA16479@router-fw-old.local.net-space.pl> Reply-To: v.tolstov@selfip.ru Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20100812012224.GA16479@router-fw-old.local.net-space.pl> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Daniel Kiper Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org =D0=92 =D0=A7=D1=82=D0=B2, 12/08/2010 =D0=B2 03:22 +0200, Daniel Kiper =D0= =BF=D0=B8=D1=88=D0=B5=D1=82: > Hi, >=20 > Here is the third version of memory hotplug support > for Xen guests patch. This one cleanly applies to > git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git > repository, xen/memory-hotplug head. >=20 > On Fri, Aug 06, 2010 at 04:03:18PM +0400, Vasiliy G Tolstov wrote: > [...] > > Testing on sles 11 sp1 and opensuse 11.3. On results - send e-mail.. >=20 > Thx. >=20 > On Fri, Aug 06, 2010 at 12:34:08PM -0400, Konrad Rzeszutek Wilk wrote: > [...] > > > +static int allocate_additional_memory(unsigned long nr_pages) > > > +{ > > > + long rc; > > > + resource_size_t r_min, r_size; > > > + struct resource *r; > > > + struct xen_memory_reservation reservation =3D { > > > + .address_bits =3D 0, > > > + .extent_order =3D 0, > > > + .domid =3D DOMID_SELF > > > + }; > > > + unsigned long flags, i, pfn; > > > + > > > + if (nr_pages > ARRAY_SIZE(frame_list)) > > > + nr_pages =3D ARRAY_SIZE(frame_list); > > > + > > > + spin_lock_irqsave(&balloon_lock, flags); > > > + > > > + if (!is_memory_resource_reserved()) { > > > + > > > + /* > > > + * Look for first unused memory region starting at page > > > + * boundary. Skip last memory section created at boot time > > > + * becuase it may contains unused memory pages with PG_reserved > > > + * bit not set (online_pages require PG_reserved bit set). > > > + */ > > > + > > > + r =3D kzalloc(sizeof(struct resource), GFP_KERNEL); > > > > You are holding a spinlock here. Kzalloc can sleep >=20 > Thx. Fixed. >=20 > On Fri, Aug 06, 2010 at 10:42:48AM -0700, Jeremy Fitzhardinge wrote: > > > - PV on HVM mode is supported now; it was tested on > > > git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git > > > repository, 2.6.34-pvhvm head, > > > > Good. I noticed you have some specific tests for "xen_pv_domain()" - > > are there many differences between pv and hvm? >=20 > No. Only those changes are needed where > xen_domain()/xen_pv_domain() is used. >=20 > > >>>+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG > > >>>+static inline unsigned long current_target(void) > > >>>+{ > > >>>+ return balloon_stats.target_pages; > > >>Why does this need its own version? > > >Because original version return values not bigger > > >then initial memory allocation which does not allow > > >memory hotplug to function. > > > > But surely they can be combined? A system without > > XEN_BALLOON_MEMORY_HOTPLUG is identical to a system with > > XEN_BALLOON_MEMORY_HOTPLUG which hasn't yet added any memory. Some > > variables may become constants (because memory can never be hot-added= ), > > but the logic of the code should be the same. >=20 > Done. >=20 > > Overall, this looks much better. The next step is to split this into= at > > least two patches: one for the core code, and one for the Xen bits. > > Each patch should do just one logical operation, so if you have sever= al > > distinct changes to the core code, put them in separate patches. >=20 > I will do that if this patch will be accepted. >=20 > > >diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > >index 38434da..beb1aa7 100644 > > >--- a/arch/x86/Kconfig > > >+++ b/arch/x86/Kconfig > > >@@ -1273,7 +1273,7 @@ config ARCH_SELECT_MEMORY_MODEL > > > depends on ARCH_SPARSEMEM_ENABLE > > > > > > config ARCH_MEMORY_PROBE > > >- def_bool y > > >+ def_bool X86_64&& !XEN > > > depends on MEMORY_HOTPLUG > > > > The trouble with making anything statically depend on Xen at config t= ime > > is that you lose it even if you're not running under Xen. A pvops > > kernel can run on bare hardware as well, and we don't want to lose > > functionality (assume that CONFIG_XEN is always set, since distros do > > always set it). > > > > Can you find a clean way to prevent/disable ARCH_MEMORY_PROBE at runt= ime > > when in a Xen context? >=20 > There is no simple way to do that. It requiers to do some > changes in drivers/base/memory.c code. I think it should > be done as kernel boot option (on by default to not break > things using this interface now). If it be useful for maintainers > of mm/memory_hotplug.c and drivers/base/memory.c code then > I could do that. Currently original arch/x86/Kconfig version > is restored. >=20 > > >+/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_H= OTPLUG > > >*/ > > >+static int __ref xen_add_memory(int nid, u64 start, u64 size) > > > > Could this be __meminit too then? >=20 > Good question. I looked throught the code and could > not find any simple explanation why mm/memory_hotplug.c > authors used __ref instead __meminit. Could you (mm/memory_hotplug.c > authors/maintainers) tell us why ??? >=20 > > >+{ > > >+ pg_data_t *pgdat =3D NULL; > > >+ int new_pgdat =3D 0, ret; > > >+ > > >+ lock_system_sleep(); > > > > What's this for? I see all its other users are in the memory hotplug > > code, but presumably they're concerned about a real S3 suspend. Do w= e > > care about that here? >=20 > Yes, because as I know S3 state is supported by Xen guests. >=20 > > Actually, this is nearly identical to mm/memory_hotplug.c:add_memory(= ). > > It looks to me like you should: > > > > * pull the common core out into mm/memory_hotplug.c:__add_memory() > > (or a better name) > > * make add_memory() do its > > register_memory_resource()/firmware_map_add_hotplug() around tha= t > > (assuming they're definitely unwanted in the Xen case) > > * make xen_add_memory() just call __add_memory() along with whatev= er > > else it needs (which is nothing?) > > > > That way you can export a high-level __add_memory function from > > memory_hotplug.c rather than the two internal detail functions. >=20 > Done. >=20 > > >+ r->name =3D "System RAM"; > > > > How about making it clear its Xen hotplug RAM? Or do things care abo= ut > > the "System RAM" name? >=20 > As I know no however as I saw anybody do not differentiate between > normal and hotplugged memory. I thought about that ealier however > stated that this soultion does not give us any real gain. That is why > I decided to use standard name for hotplugged memory. >=20 > If you have a questions please drop me a line. >=20 > Daniel >=20 > Signed-off-by: Daniel Kiper > --- > arch/x86/Kconfig | 2 +- > drivers/xen/balloon.c | 95 ++++++++------------------------= ------- > include/linux/memory_hotplug.h | 3 +- > mm/memory_hotplug.c | 55 ++++++++++++++++------- > 4 files changed, 61 insertions(+), 94 deletions(-) >=20 > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index beb1aa7..9458685 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1273,7 +1273,7 @@ config ARCH_SELECT_MEMORY_MODEL > depends on ARCH_SPARSEMEM_ENABLE > =20 > config ARCH_MEMORY_PROBE > - def_bool X86_64 && !XEN > + def_bool X86_64 > depends on MEMORY_HOTPLUG > =20 > config ILLEGAL_POINTER_VALUE > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index 31edc26..5120075 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -193,63 +193,11 @@ static void balloon_alarm(unsigned long unused) > } > =20 > #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG > -static inline unsigned long current_target(void) > -{ > - return balloon_stats.target_pages; > -} > - > static inline u64 is_memory_resource_reserved(void) > { > return balloon_stats.hotplug_start_paddr; > } > =20 > -/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTP= LUG */ > -static int __ref xen_add_memory(int nid, u64 start, u64 size) > -{ > - pg_data_t *pgdat =3D NULL; > - int new_pgdat =3D 0, ret; > - > - lock_system_sleep(); > - > - if (!node_online(nid)) { > - pgdat =3D hotadd_new_pgdat(nid, start); > - ret =3D -ENOMEM; > - if (!pgdat) > - goto out; > - new_pgdat =3D 1; > - } > - > - /* call arch's memory hotadd */ > - ret =3D arch_add_memory(nid, start, size); > - > - if (ret < 0) > - goto error; > - > - /* we online node here. we can't roll back from here. */ > - node_set_online(nid); > - > - if (new_pgdat) { > - ret =3D register_one_node(nid); > - /* > - * If sysfs file of new node can't create, cpu on the node > - * can't be hot-added. There is no rollback way now. > - * So, check by BUG_ON() to catch it reluctantly.. > - */ > - BUG_ON(ret); > - } > - > - goto out; > - > -error: > - /* rollback pgdat allocation */ > - if (new_pgdat) > - rollback_node_hotadd(nid, pgdat); > - > -out: > - unlock_system_sleep(); > - return ret; > -} > - > static int allocate_additional_memory(unsigned long nr_pages) > { > long rc; > @@ -265,8 +213,6 @@ static int allocate_additional_memory(unsigned long= nr_pages) > if (nr_pages > ARRAY_SIZE(frame_list)) > nr_pages =3D ARRAY_SIZE(frame_list); > =20 > - spin_lock_irqsave(&balloon_lock, flags); > - > if (!is_memory_resource_reserved()) { > =20 > /* > @@ -280,7 +226,7 @@ static int allocate_additional_memory(unsigned long= nr_pages) > =20 > if (!r) { > rc =3D -ENOMEM; > - goto out; > + goto out_0; > } > =20 > r->name =3D "System RAM"; > @@ -293,12 +239,14 @@ static int allocate_additional_memory(unsigned lo= ng nr_pages) > =20 > if (rc < 0) { > kfree(r); > - goto out; > + goto out_0; > } > =20 > balloon_stats.hotplug_start_paddr =3D r->start; > } > =20 > + spin_lock_irqsave(&balloon_lock, flags); > + > pfn =3D PFN_DOWN(balloon_stats.hotplug_start_paddr + balloon_stats.ho= tplug_size); > =20 > for (i =3D 0; i < nr_pages; ++i, ++pfn) > @@ -310,7 +258,7 @@ static int allocate_additional_memory(unsigned long= nr_pages) > rc =3D HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); > =20 > if (rc < 0) > - goto out; > + goto out_1; > =20 > pfn =3D PFN_DOWN(balloon_stats.hotplug_start_paddr + balloon_stats.ho= tplug_size); > =20 > @@ -323,9 +271,10 @@ static int allocate_additional_memory(unsigned lon= g nr_pages) > balloon_stats.hotplug_size +=3D rc << PAGE_SHIFT; > balloon_stats.current_pages +=3D rc; > =20 > -out: > +out_1: > spin_unlock_irqrestore(&balloon_lock, flags); > =20 > +out_0: > return rc < 0 ? rc : rc !=3D nr_pages; > } > =20 > @@ -337,11 +286,11 @@ static void hotplug_allocated_memory(void) > =20 > nid =3D memory_add_physaddr_to_nid(balloon_stats.hotplug_start_paddr)= ; > =20 > - ret =3D xen_add_memory(nid, balloon_stats.hotplug_start_paddr, > + ret =3D add_registered_memory(nid, balloon_stats.hotplug_start_paddr, > balloon_stats.hotplug_size); > =20 > if (ret) { > - pr_err("%s: xen_add_memory: Memory hotplug failed: %i\n", > + pr_err("%s: add_registered_memory: Memory hotplug failed: %i\n", > __func__, ret); > goto error; > } > @@ -388,18 +337,6 @@ out: > balloon_stats.hotplug_size =3D 0; > } > #else > -static unsigned long current_target(void) > -{ > - unsigned long target =3D balloon_stats.target_pages; > - > - target =3D min(target, > - balloon_stats.current_pages + > - balloon_stats.balloon_low + > - balloon_stats.balloon_high); > - > - return target; > -} > - > static inline u64 is_memory_resource_reserved(void) > { > return 0; > @@ -407,13 +344,21 @@ static inline u64 is_memory_resource_reserved(voi= d) > =20 > static inline int allocate_additional_memory(unsigned long nr_pages) > { > + /* > + * CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is not set. > + * balloon_stats.target_pages could not be bigger > + * than balloon_stats.current_pages because additional > + * memory allocation is not possible. > + */ > + balloon_stats.target_pages =3D balloon_stats.current_pages; > + > return 0; > } > =20 > static inline void hotplug_allocated_memory(void) > { > } > -#endif > +#endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */ > =20 > static int increase_reservation(unsigned long nr_pages) > { > @@ -553,7 +498,7 @@ static void balloon_process(struct work_struct *wor= k) > mutex_lock(&balloon_mutex); > =20 > do { > - credit =3D current_target() - balloon_stats.current_pages; > + credit =3D balloon_stats.target_pages - balloon_stats.current_pages; > =20 > if (credit > 0) { > if (balloon_stats.balloon_low || balloon_stats.balloon_high) > @@ -572,7 +517,7 @@ static void balloon_process(struct work_struct *wor= k) > } while ((credit !=3D 0) && !need_sleep); > =20 > /* Schedule more work if there is some still to be done. */ > - if (current_target() !=3D balloon_stats.current_pages) > + if (balloon_stats.target_pages !=3D balloon_stats.current_pages) > mod_timer(&balloon_timer, jiffies + HZ); > else if (is_memory_resource_reserved()) > hotplug_allocated_memory(); > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotp= lug.h > index 6652eae..37f1894 100644 > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -202,8 +202,7 @@ static inline int is_mem_section_removable(unsigned= long pfn, > } > #endif /* CONFIG_MEMORY_HOTREMOVE */ > =20 > -extern pg_data_t *hotadd_new_pgdat(int nid, u64 start); > -extern void rollback_node_hotadd(int nid, pg_data_t *pgdat); > +extern int add_registered_memory(int nid, u64 start, u64 size); > extern int add_memory(int nid, u64 start, u64 size); > extern int arch_add_memory(int nid, u64 start, u64 size); > extern int remove_memory(u64 start, u64 size); > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 143e03c..48a65bb 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -453,7 +453,7 @@ int online_pages(unsigned long pfn, unsigned long n= r_pages) > #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */ > =20 > /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTP= LUG */ > -pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) > +static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) > { > struct pglist_data *pgdat; > unsigned long zones_size[MAX_NR_ZONES] =3D {0}; > @@ -473,32 +473,21 @@ pg_data_t __ref *hotadd_new_pgdat(int nid, u64 st= art) > =20 > return pgdat; > } > -EXPORT_SYMBOL_GPL(hotadd_new_pgdat); > =20 > -void rollback_node_hotadd(int nid, pg_data_t *pgdat) > +static void rollback_node_hotadd(int nid, pg_data_t *pgdat) > { > arch_refresh_nodedata(nid, NULL); > arch_free_nodedata(pgdat); > return; > } > -EXPORT_SYMBOL_GPL(rollback_node_hotadd); > - > =20 > /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTP= LUG */ > -int __ref add_memory(int nid, u64 start, u64 size) > +static int __ref __add_memory(int nid, u64 start, u64 size) > { > pg_data_t *pgdat =3D NULL; > int new_pgdat =3D 0; > - struct resource *res; > int ret; > =20 > - lock_system_sleep(); > - > - res =3D register_memory_resource(start, size); > - ret =3D -EEXIST; > - if (!res) > - goto out; > - > if (!node_online(nid)) { > pgdat =3D hotadd_new_pgdat(nid, start); > ret =3D -ENOMEM; > @@ -535,11 +524,45 @@ error: > /* rollback pgdat allocation and others */ > if (new_pgdat) > rollback_node_hotadd(nid, pgdat); > - if (res) > - release_memory_resource(res); > =20 > out: > + return ret; > +} > + > +int __ref add_registered_memory(int nid, u64 start, u64 size) > +{ > + int ret; > + > + lock_system_sleep(); > + ret =3D __add_memory(nid, start, size); > unlock_system_sleep(); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(add_registered_memory); > + > +int __ref add_memory(int nid, u64 start, u64 size) > +{ > + int ret =3D -EEXIST; > + struct resource *res; > + > + lock_system_sleep(); > + > + res =3D register_memory_resource(start, size); > + > + if (!res) > + goto out; > + > + ret =3D __add_memory(nid, start, size); > + > + if (!ret) > + goto out; > + > + release_memory_resource(res); > + > +out: > + unlock_system_sleep(); > + > return ret; > } > EXPORT_SYMBOL_GPL(add_memory); >=20 Can You provide patch to sles 11 sp1 ? I found that sles has some modification to the kernel and patch does not apply cleanly =3D(. --=20 Vasiliy G Tolstov Selfip.Ru