From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: [PATCHv1] xen/balloon: disable memory hotplug in PV guests Date: Thu, 19 Mar 2015 10:55:37 +0100 Message-ID: <550A9D19.9040800@suse.com> References: <54FED833.1080609@citrix.com> <54FEF308.3090702@oracle.com> <5500544B.4020302@citrix.com> <550061F0.9030206@oracle.com> <55066B88.9040300@suse.com> <20150316100344.GV27971@olila.local.net-space.pl> <5506B115.8090804@suse.com> <5509553A.7030707@citrix.com> <5509843B.2080205@suse.com> <550984DE.4010800@citrix.com> <20150318151424.GB27971@olila.local.net-space.pl> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1YYXAy-0004zF-2X for xen-devel@lists.xenproject.org; Thu, 19 Mar 2015 09:55:40 +0000 In-Reply-To: <20150318151424.GB27971@olila.local.net-space.pl> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Daniel Kiper , David Vrabel Cc: xen-devel@lists.xenproject.org, Boris Ostrovsky List-Id: xen-devel@lists.xenproject.org On 03/18/2015 04:14 PM, Daniel Kiper wrote: > On Wed, Mar 18, 2015 at 01:59:58PM +0000, David Vrabel wrote: >> On 18/03/15 13:57, Juergen Gross wrote: >>> On 03/18/2015 11:36 AM, David Vrabel wrote: >>>> On 16/03/15 10:31, Juergen Gross wrote: >>>>> On 03/16/2015 11:03 AM, Daniel Kiper wrote: >>>>>> On Mon, Mar 16, 2015 at 06:35:04AM +0100, Juergen Gross wrote: >>>>>>> On 03/11/2015 04:40 PM, Boris Ostrovsky wrote: >>>>>>>> On 03/11/2015 10:42 AM, David Vrabel wrote: >>>>>>>>> On 10/03/15 13:35, Boris Ostrovsky wrote: >>>>>>>>>> On 03/10/2015 07:40 AM, David Vrabel wrote: >>>>>>>>>>> On 09/03/15 14:10, David Vrabel wrote: >>>>>>>>>>>> Memory hotplug doesn't work with PV guests because: >>>>>>>>>>>> >>>>>>>>>>>> a) The p2m cannot be expanded to cover the new sections. >>>>>>>>>>> Broken by 054954eb051f35e74b75a566a96fe756015352c8 (xen: switch to >>>>>>>>>>> linear virtual mapped sparse p2m list). >>>>>>>>>>> >>>>>>>>>>> This one would be non-trivial to fix. We'd need a sparse set of >>>>>>>>>>> vm_area's for the p2m or similar. >>>>>>>>>>> >>>>>>>>>>>> b) add_memory() builds page tables for the new sections >>>>>>>>>>>> which >>>>>>>>>>>> means >>>>>>>>>>>> the new pages must have valid p2m entries (or a BUG >>>>>>>>>>>> occurs). >>>>>>>>>>> After some more testing this appears to be broken by: >>>>>>>>>>> >>>>>>>>>>> 25b884a83d487fd62c3de7ac1ab5549979188482 (x86/xen: set regions >>>>>>>>>>> above >>>>>>>>>>> the >>>>>>>>>>> end of RAM as 1:1) included 3.16. >>>>>>>>>>> >>>>>>>>>>> This one can be trivially fixed by setting the new sections in >>>>>>>>>>> the p2m >>>>>>>>>>> to INVALID_P2M_ENTRY before calling add_memory(). >>>>>>>>>> Have you tried 3.17? As I said yesterday, it worked for me (with >>>>>>>>>> 4.4 >>>>>>>>>> Xen). >>>>>>>>> No. But there are three bugs that prevent it from working in >>>>>>>>> 3.16+ so >>>>>>>>> I'm really not sure how you had a working in a 3.17 PV guest. >>>>>>>> >>>>>>>> This is what I have: >>>>>>>> >>>>>>>> [build@build-mk2 linux-boris]$ ssh root@tst008 cat >>>>>>>> /mnt/lab/bootstrap-x86_64/test_small.xm >>>>>>>> extra="console=hvc0 debug earlyprintk=xen " >>>>>>>> kernel="/mnt/lab/bootstrap-x86_64/vmlinuz" >>>>>>>> ramdisk="/mnt/lab/bootstrap-x86_64/initramfs.cpio.gz" >>>>>>>> memory=1024 >>>>>>>> maxmem = 4096 >>>>>>>> vcpus=1 >>>>>>>> maxvcpus=3 >>>>>>>> name="bootstrap-x86_64" >>>>>>>> on_crash="preserve" >>>>>>>> vif = [ 'mac=00:0F:4B:00:00:68, bridge=switch' ] >>>>>>>> vnc=1 >>>>>>>> vnclisten="0.0.0.0" >>>>>>>> disk=['phy:/dev/guests/bootstrap-x86_64,xvda,w'] >>>>>>>> [build@build-mk2 linux-boris]$ ssh root@tst008 xl create >>>>>>>> /mnt/lab/bootstrap-x86_64/test_small.xm >>>>>>>> Parsing config from /mnt/lab/bootstrap-x86_64/test_small.xm >>>>>>>> [build@build-mk2 linux-boris]$ ssh root@tst008 xl list |grep >>>>>>>> bootstrap-x86_64 >>>>>>>> bootstrap-x86_64 2 1024 1 >>>>>>>> -b---- 5.4 >>>>>>>> [build@build-mk2 linux-boris]$ ssh root@g-pvops uname -r >>>>>>>> 3.17.0upstream >>>>>>>> [build@build-mk2 linux-boris]$ ssh root@g-pvops dmesg|grep >>>>>>>> paravirtualized >>>>>>>> [ 0.000000] Booting paravirtualized kernel on Xen >>>>>>>> [build@build-mk2 linux-boris]$ ssh root@g-pvops grep MemTotal >>>>>>>> /proc/meminfo >>>>>>>> MemTotal: 968036 kB >>>>>>>> [build@build-mk2 linux-boris]$ ssh root@tst008 xl mem-set >>>>>>>> bootstrap-x86_64 2048 >>>>>>>> [build@build-mk2 linux-boris]$ ssh root@tst008 xl list |grep >>>>>>>> bootstrap-x86_64 >>>>>>>> bootstrap-x86_64 2 2048 1 >>>>>>>> -b---- 5.7 >>>>>>>> [build@build-mk2 linux-boris]$ ssh root@g-pvops grep MemTotal >>>>>>>> /proc/meminfo >>>>>>>> MemTotal: 2016612 kB >>>>>>>> [build@build-mk2 linux-boris]$ >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Regardless, it definitely doesn't work now because of the linear p2m >>>>>>>>> change. What do you want to do about this? >>>>>>>> >>>>>>>> Since backing out p2m changes is not an option I guess your patch is >>>>>>>> the >>>>>>>> only short-term alternative. >>>>>>>> >>>>>>>> But this still looks like a regression so perhaps Juergen can take a >>>>>>>> look to see how it can be fixed. >>>>>>> >>>>>>> Hmm, the p2m list is allocated for the maximum memory size of the >>>>>>> domain >>>>>>> which is obtained from the hypervisor. In case of Dom0 it is read via >>>>>>> XENMEM_maximum_reservation, for a domU it is based on the E820 memory >>>>>>> map read via XENMEM_memory_map. >>>>>>> >>>>>>> I just tested it with a 4.0-rc1 domU kernel with 512MB initial memory >>>>>>> and 4GB of maxmem. The E820 map looked like this: >>>>>>> >>>>>>> [ 0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable >>>>>>> [ 0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] >>>>>>> reserved >>>>>>> [ 0.000000] Xen: [mem 0x0000000000100000-0x00000000ffffffff] usable >>>>>>> >>>>>>> So the complete 4GB were included, like they should. The resulting p2m >>>>>>> list is allocated in the needed size: >>>>>>> >>>>>>> [ 0.000000] p2m virtual area at ffffc90000000000, size is 800000 >>>>>>> >>>>>>> So what is your problem here? Can you post the E820 map and the p2m >>>>>>> map >>>>>>> info for your failing domain, please? >>>>>> >>>>>> If you use memory hotplug then maxmem is not a limit from guest kernel >>>>>> point of view (host still must allow that operation but it is another >>>>>> not related issue). The problem is that p2m must be dynamically >>>>>> expendable >>>>>> to support it. Earlier implementation supported that thing and memory >>>>>> hotplug worked without any issue. >>>>> >>>>> Okay, now I get it. >>>>> >>>>> The problem with the earlier p2m implementation was that it was >>>>> expendable to support only up to 512GB of RAM. So we need some way to >>>>> tell the kernel how much virtual memory it should reserve for the p2m >>>>> list if memory hotplug is enabled. We could: >>>>> >>>>> a) use a configurable maximum (e.g. for 512GB RAM as today) >>>> >>>> I would set the p2m virtual area to cover up to 512 GB (needs 1 GB of >>>> virt space) for a 64-bit guest and up to 64 GB (needs 64 MB of virt >>>> space) for a 32-bit guest. >>> >>> Are 64 GB for 32 bit guests a sensible default? This will need more than >>> 10% of the available virtual kernel space (taking fixmap etc. into >>> account). And a 64 GB sized 32 bit domain is hardly usable (you have to >>> play dirty tricks to get it even running). >>> >>> I'd rather use a default of 4 GB which can be changed via a Kconfig >>> option. For 64 bits the default of 512 GB is okay, but should be >>> configurable as well. >> >> Ok. > > I have checked new p2m code and I think that this is reasonable solution too. Do I need any patches for xl to be able to test this? I did: xl mem-max 2 4096 xl mem-set 2 4096 and get: libxl: error: libxl.c:4779:libxl_set_memory_target: memory_dynamic_max must be less than or equal to memory_static_max xl list -l shows: ... { "domid": 2, "config": { "c_info": { "type": "pv", "name": "sles11", "uuid": "c53944f1-1607-e367-278a-c7980b6cfdd0", "run_hotplug_scripts": "True" }, "b_info": { "max_vcpus": 1, "avail_vcpus": [ 0 ], "numa_placement": "True", "max_memkb": 524288, "target_memkb": 524288, "video_memkb": 0, "shadow_memkb": 5120, "localtime": "False", "disable_migrate": "False", "blkdev_start": "xvda", "device_model_version": "qemu_xen", "device_model_stubdomain": "False", "sched_params": { }, ... which seems to reflect only the parameters from starting the domU: name="sles11" description="None" uuid="c53944f1-1607-e367-278a-c7980b6cfdd0" memory=512 maxmem=512 vcpus=1 on_poweroff="destroy" on_reboot="restart" on_crash="destroy" localtime=0 keymap="de" builder="linux" bootloader="/usr/bin/pygrub" bootargs="" extra="xencons=tty " disk=[ 'file:/home/sles11-2,xvda,w', ] vif=[ 'mac=00:16:3e:06:a7:21,bridge=br0', ] > David I thought your name was Daniel? ;-) Juergen