From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: virtio-dev-return-2420-cohuck=redhat.com@lists.oasis-open.org Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [66.179.20.138]) by lists.oasis-open.org (Postfix) with ESMTP id D02C35819266 for ; Wed, 26 Jul 2017 04:41:49 -0700 (PDT) Message-ID: <59788097.6010402@intel.com> Date: Wed, 26 Jul 2017 19:44:23 +0800 From: Wei Wang MIME-Version: 1.0 References: <20170719081311.GC26779@dhcp22.suse.cz> <596F4A0E.4010507@intel.com> <20170724090042.GF25221@dhcp22.suse.cz> <59771010.6080108@intel.com> <20170725112513.GD26723@dhcp22.suse.cz> <597731E8.9040803@intel.com> <20170725124141.GF26723@dhcp22.suse.cz> <286AC319A985734F985F78AFA26841F739283F62@shsmsx102.ccr.corp.intel.com> <20170725145333.GK26723@dhcp22.suse.cz> <5977FCDF.7040606@intel.com> <20170726102458.GH2981@dhcp22.suse.cz> In-Reply-To: <20170726102458.GH2981@dhcp22.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: [virtio-dev] Re: [PATCH v12 6/8] mm: support reporting free page blocks To: Michal Hocko Cc: "Michael S. Tsirkin" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "akpm@linux-foundation.org" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "liliang.opensource@gmail.com" , "virtio-dev@lists.oasis-open.org" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" List-ID: On 07/26/2017 06:24 PM, Michal Hocko wrote: > On Wed 26-07-17 10:22:23, Wei Wang wrote: >> On 07/25/2017 10:53 PM, Michal Hocko wrote: >>> On Tue 25-07-17 14:47:16, Wang, Wei W wrote: >>>> On Tuesday, July 25, 2017 8:42 PM, hal Hocko wrote: >>>>> On Tue 25-07-17 19:56:24, Wei Wang wrote: >>>>>> On 07/25/2017 07:25 PM, Michal Hocko wrote: >>>>>>> On Tue 25-07-17 17:32:00, Wei Wang wrote: >>>>>>>> On 07/24/2017 05:00 PM, Michal Hocko wrote: >>>>>>>>> On Wed 19-07-17 20:01:18, Wei Wang wrote: >>>>>>>>>> On 07/19/2017 04:13 PM, Michal Hocko wrote: >>>>>>>>> [... >>>>>> We don't need to do the pfn walk in the guest kernel. When the API >>>>>> reports, for example, a 2MB free page block, the API caller offers to >>>>>> the hypervisor the base address of the page block, and size=2MB, to >>>>>> the hypervisor. >>>>> So you want to skip pfn walks by regularly calling into the page allocator to >>>>> update your bitmap. If that is the case then would an API that would allow you >>>>> to update your bitmap via a callback be s sufficient? Something like >>>>> void walk_free_mem(int node, int min_order, >>>>> void (*visit)(unsigned long pfn, unsigned long nr_pages)) >>>>> >>>>> The function will call the given callback for each free memory block on the given >>>>> node starting from the given min_order. The callback will be strictly an atomic >>>>> and very light context. You can update your bitmap from there. >>>> I would need to introduce more about the background here: >>>> The hypervisor and the guest live in their own address space. The hypervisor's bitmap >>>> isn't seen by the guest. I think we also wouldn't be able to give a callback function >>> >from the hypervisor to the guest in this case. >>> How did you plan to use your original API which export struct page array >>> then? >> >> That's where the virtio-balloon driver comes in. It uses a shared ring >> mechanism to >> send the guest memory info to the hypervisor. >> >> We didn't expose the struct page array from the guest to the hypervisor. For >> example, when >> a 2MB free page block is reported from the free page list, the info put on >> the ring is just >> (base address of the 2MB continuous memory, size=2M). > So what exactly prevents virtio-balloon from using the above proposed > callback mechanism and export what is needed to the hypervisor? I thought about it more. Probably we can use the callback function with a little change like this: void walk_free_mem(void *opaque1, void (*visit)(void *opaque2, unsigned long pfn, unsigned long nr_pages)) { ... for_each_populated_zone(zone) { for_each_migratetype_order(order, type) { report_unused_page_block(zone, order, type, &page); // from patch 6 pfn = page_to_pfn(page); visit(opaque1, pfn, 1 << order); } } } The above function scans all the free list and directly sends each free page block to the hypervisor via the virtio_balloon callback below. No need to implement a bitmap. In virtio-balloon, we have the callback: void *virtio_balloon_report_unused_pages(void *opaque, unsigned long pfn, unsigned long nr_pages) { struct virtio_balloon *vb = (struct virtio_balloon *)opaque; ...put the free page block to the the ring of vb; } What do you think? Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Wang Subject: Re: [PATCH v12 6/8] mm: support reporting free page blocks Date: Wed, 26 Jul 2017 19:44:23 +0800 Message-ID: <59788097.6010402@intel.com> References: <20170719081311.GC26779@dhcp22.suse.cz> <596F4A0E.4010507@intel.com> <20170724090042.GF25221@dhcp22.suse.cz> <59771010.6080108@intel.com> <20170725112513.GD26723@dhcp22.suse.cz> <597731E8.9040803@intel.com> <20170725124141.GF26723@dhcp22.suse.cz> <286AC319A985734F985F78AFA26841F739283F62@shsmsx102.ccr.corp.intel.com> <20170725145333.GK26723@dhcp22.suse.cz> <5977FCDF.7040606@intel.com> <20170726102458.GH2981@dhcp22.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20170726102458.GH2981@dhcp22.suse.cz> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: Michal Hocko Cc: "aarcange@redhat.com" , "virtio-dev@lists.oasis-open.org" , "amit.shah@redhat.com" , "kvm@vger.kernel.org" , "Michael S. Tsirkin" , "linux-kernel@vger.kernel.org" , "liliang.opensource@gmail.com" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "linux-mm@kvack.org" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" , "cornelia.huck@de.ibm.com" , "pbonzini@redhat.com" , "akpm@linux-foundation.org" , "mgorman@techsingularity.net" List-Id: virtualization@lists.linuxfoundation.org On 07/26/2017 06:24 PM, Michal Hocko wrote: > On Wed 26-07-17 10:22:23, Wei Wang wrote: >> On 07/25/2017 10:53 PM, Michal Hocko wrote: >>> On Tue 25-07-17 14:47:16, Wang, Wei W wrote: >>>> On Tuesday, July 25, 2017 8:42 PM, hal Hocko wrote: >>>>> On Tue 25-07-17 19:56:24, Wei Wang wrote: >>>>>> On 07/25/2017 07:25 PM, Michal Hocko wrote: >>>>>>> On Tue 25-07-17 17:32:00, Wei Wang wrote: >>>>>>>> On 07/24/2017 05:00 PM, Michal Hocko wrote: >>>>>>>>> On Wed 19-07-17 20:01:18, Wei Wang wrote: >>>>>>>>>> On 07/19/2017 04:13 PM, Michal Hocko wrote: >>>>>>>>> [... >>>>>> We don't need to do the pfn walk in the guest kernel. When the API >>>>>> reports, for example, a 2MB free page block, the API caller offers to >>>>>> the hypervisor the base address of the page block, and size=2MB, to >>>>>> the hypervisor. >>>>> So you want to skip pfn walks by regularly calling into the page allocator to >>>>> update your bitmap. If that is the case then would an API that would allow you >>>>> to update your bitmap via a callback be s sufficient? Something like >>>>> void walk_free_mem(int node, int min_order, >>>>> void (*visit)(unsigned long pfn, unsigned long nr_pages)) >>>>> >>>>> The function will call the given callback for each free memory block on the given >>>>> node starting from the given min_order. The callback will be strictly an atomic >>>>> and very light context. You can update your bitmap from there. >>>> I would need to introduce more about the background here: >>>> The hypervisor and the guest live in their own address space. The hypervisor's bitmap >>>> isn't seen by the guest. I think we also wouldn't be able to give a callback function >>> >from the hypervisor to the guest in this case. >>> How did you plan to use your original API which export struct page array >>> then? >> >> That's where the virtio-balloon driver comes in. It uses a shared ring >> mechanism to >> send the guest memory info to the hypervisor. >> >> We didn't expose the struct page array from the guest to the hypervisor. For >> example, when >> a 2MB free page block is reported from the free page list, the info put on >> the ring is just >> (base address of the 2MB continuous memory, size=2M). > So what exactly prevents virtio-balloon from using the above proposed > callback mechanism and export what is needed to the hypervisor? I thought about it more. Probably we can use the callback function with a little change like this: void walk_free_mem(void *opaque1, void (*visit)(void *opaque2, unsigned long pfn, unsigned long nr_pages)) { ... for_each_populated_zone(zone) { for_each_migratetype_order(order, type) { report_unused_page_block(zone, order, type, &page); // from patch 6 pfn = page_to_pfn(page); visit(opaque1, pfn, 1 << order); } } } The above function scans all the free list and directly sends each free page block to the hypervisor via the virtio_balloon callback below. No need to implement a bitmap. In virtio-balloon, we have the callback: void *virtio_balloon_report_unused_pages(void *opaque, unsigned long pfn, unsigned long nr_pages) { struct virtio_balloon *vb = (struct virtio_balloon *)opaque; ...put the free page block to the the ring of vb; } What do you think? Best, Wei From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 155C76B02F4 for ; Wed, 26 Jul 2017 07:41:48 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id g69so4277023pfe.5 for ; Wed, 26 Jul 2017 04:41:48 -0700 (PDT) Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id v66si9384445pfb.416.2017.07.26.04.41.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 04:41:46 -0700 (PDT) Message-ID: <59788097.6010402@intel.com> Date: Wed, 26 Jul 2017 19:44:23 +0800 From: Wei Wang MIME-Version: 1.0 Subject: Re: [PATCH v12 6/8] mm: support reporting free page blocks References: <20170719081311.GC26779@dhcp22.suse.cz> <596F4A0E.4010507@intel.com> <20170724090042.GF25221@dhcp22.suse.cz> <59771010.6080108@intel.com> <20170725112513.GD26723@dhcp22.suse.cz> <597731E8.9040803@intel.com> <20170725124141.GF26723@dhcp22.suse.cz> <286AC319A985734F985F78AFA26841F739283F62@shsmsx102.ccr.corp.intel.com> <20170725145333.GK26723@dhcp22.suse.cz> <5977FCDF.7040606@intel.com> <20170726102458.GH2981@dhcp22.suse.cz> In-Reply-To: <20170726102458.GH2981@dhcp22.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: "Michael S. Tsirkin" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "akpm@linux-foundation.org" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "liliang.opensource@gmail.com" , "virtio-dev@lists.oasis-open.org" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" On 07/26/2017 06:24 PM, Michal Hocko wrote: > On Wed 26-07-17 10:22:23, Wei Wang wrote: >> On 07/25/2017 10:53 PM, Michal Hocko wrote: >>> On Tue 25-07-17 14:47:16, Wang, Wei W wrote: >>>> On Tuesday, July 25, 2017 8:42 PM, hal Hocko wrote: >>>>> On Tue 25-07-17 19:56:24, Wei Wang wrote: >>>>>> On 07/25/2017 07:25 PM, Michal Hocko wrote: >>>>>>> On Tue 25-07-17 17:32:00, Wei Wang wrote: >>>>>>>> On 07/24/2017 05:00 PM, Michal Hocko wrote: >>>>>>>>> On Wed 19-07-17 20:01:18, Wei Wang wrote: >>>>>>>>>> On 07/19/2017 04:13 PM, Michal Hocko wrote: >>>>>>>>> [... >>>>>> We don't need to do the pfn walk in the guest kernel. When the API >>>>>> reports, for example, a 2MB free page block, the API caller offers to >>>>>> the hypervisor the base address of the page block, and size=2MB, to >>>>>> the hypervisor. >>>>> So you want to skip pfn walks by regularly calling into the page allocator to >>>>> update your bitmap. If that is the case then would an API that would allow you >>>>> to update your bitmap via a callback be s sufficient? Something like >>>>> void walk_free_mem(int node, int min_order, >>>>> void (*visit)(unsigned long pfn, unsigned long nr_pages)) >>>>> >>>>> The function will call the given callback for each free memory block on the given >>>>> node starting from the given min_order. The callback will be strictly an atomic >>>>> and very light context. You can update your bitmap from there. >>>> I would need to introduce more about the background here: >>>> The hypervisor and the guest live in their own address space. The hypervisor's bitmap >>>> isn't seen by the guest. I think we also wouldn't be able to give a callback function >>> >from the hypervisor to the guest in this case. >>> How did you plan to use your original API which export struct page array >>> then? >> >> That's where the virtio-balloon driver comes in. It uses a shared ring >> mechanism to >> send the guest memory info to the hypervisor. >> >> We didn't expose the struct page array from the guest to the hypervisor. For >> example, when >> a 2MB free page block is reported from the free page list, the info put on >> the ring is just >> (base address of the 2MB continuous memory, size=2M). > So what exactly prevents virtio-balloon from using the above proposed > callback mechanism and export what is needed to the hypervisor? I thought about it more. Probably we can use the callback function with a little change like this: void walk_free_mem(void *opaque1, void (*visit)(void *opaque2, unsigned long pfn, unsigned long nr_pages)) { ... for_each_populated_zone(zone) { for_each_migratetype_order(order, type) { report_unused_page_block(zone, order, type, &page); // from patch 6 pfn = page_to_pfn(page); visit(opaque1, pfn, 1 << order); } } } The above function scans all the free list and directly sends each free page block to the hypervisor via the virtio_balloon callback below. No need to implement a bitmap. In virtio-balloon, we have the callback: void *virtio_balloon_report_unused_pages(void *opaque, unsigned long pfn, unsigned long nr_pages) { struct virtio_balloon *vb = (struct virtio_balloon *)opaque; ...put the free page block to the the ring of vb; } What do you think? Best, Wei -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751577AbdGZLls (ORCPT ); Wed, 26 Jul 2017 07:41:48 -0400 Received: from mga14.intel.com ([192.55.52.115]:60177 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751031AbdGZLlq (ORCPT ); Wed, 26 Jul 2017 07:41:46 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,414,1496127600"; d="scan'208";a="115844660" Message-ID: <59788097.6010402@intel.com> Date: Wed, 26 Jul 2017 19:44:23 +0800 From: Wei Wang User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Michal Hocko CC: "Michael S. Tsirkin" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "akpm@linux-foundation.org" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "liliang.opensource@gmail.com" , "virtio-dev@lists.oasis-open.org" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" Subject: Re: [PATCH v12 6/8] mm: support reporting free page blocks References: <20170719081311.GC26779@dhcp22.suse.cz> <596F4A0E.4010507@intel.com> <20170724090042.GF25221@dhcp22.suse.cz> <59771010.6080108@intel.com> <20170725112513.GD26723@dhcp22.suse.cz> <597731E8.9040803@intel.com> <20170725124141.GF26723@dhcp22.suse.cz> <286AC319A985734F985F78AFA26841F739283F62@shsmsx102.ccr.corp.intel.com> <20170725145333.GK26723@dhcp22.suse.cz> <5977FCDF.7040606@intel.com> <20170726102458.GH2981@dhcp22.suse.cz> In-Reply-To: <20170726102458.GH2981@dhcp22.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/26/2017 06:24 PM, Michal Hocko wrote: > On Wed 26-07-17 10:22:23, Wei Wang wrote: >> On 07/25/2017 10:53 PM, Michal Hocko wrote: >>> On Tue 25-07-17 14:47:16, Wang, Wei W wrote: >>>> On Tuesday, July 25, 2017 8:42 PM, hal Hocko wrote: >>>>> On Tue 25-07-17 19:56:24, Wei Wang wrote: >>>>>> On 07/25/2017 07:25 PM, Michal Hocko wrote: >>>>>>> On Tue 25-07-17 17:32:00, Wei Wang wrote: >>>>>>>> On 07/24/2017 05:00 PM, Michal Hocko wrote: >>>>>>>>> On Wed 19-07-17 20:01:18, Wei Wang wrote: >>>>>>>>>> On 07/19/2017 04:13 PM, Michal Hocko wrote: >>>>>>>>> [... >>>>>> We don't need to do the pfn walk in the guest kernel. When the API >>>>>> reports, for example, a 2MB free page block, the API caller offers to >>>>>> the hypervisor the base address of the page block, and size=2MB, to >>>>>> the hypervisor. >>>>> So you want to skip pfn walks by regularly calling into the page allocator to >>>>> update your bitmap. If that is the case then would an API that would allow you >>>>> to update your bitmap via a callback be s sufficient? Something like >>>>> void walk_free_mem(int node, int min_order, >>>>> void (*visit)(unsigned long pfn, unsigned long nr_pages)) >>>>> >>>>> The function will call the given callback for each free memory block on the given >>>>> node starting from the given min_order. The callback will be strictly an atomic >>>>> and very light context. You can update your bitmap from there. >>>> I would need to introduce more about the background here: >>>> The hypervisor and the guest live in their own address space. The hypervisor's bitmap >>>> isn't seen by the guest. I think we also wouldn't be able to give a callback function >>> >from the hypervisor to the guest in this case. >>> How did you plan to use your original API which export struct page array >>> then? >> >> That's where the virtio-balloon driver comes in. It uses a shared ring >> mechanism to >> send the guest memory info to the hypervisor. >> >> We didn't expose the struct page array from the guest to the hypervisor. For >> example, when >> a 2MB free page block is reported from the free page list, the info put on >> the ring is just >> (base address of the 2MB continuous memory, size=2M). > So what exactly prevents virtio-balloon from using the above proposed > callback mechanism and export what is needed to the hypervisor? I thought about it more. Probably we can use the callback function with a little change like this: void walk_free_mem(void *opaque1, void (*visit)(void *opaque2, unsigned long pfn, unsigned long nr_pages)) { ... for_each_populated_zone(zone) { for_each_migratetype_order(order, type) { report_unused_page_block(zone, order, type, &page); // from patch 6 pfn = page_to_pfn(page); visit(opaque1, pfn, 1 << order); } } } The above function scans all the free list and directly sends each free page block to the hypervisor via the virtio_balloon callback below. No need to implement a bitmap. In virtio-balloon, we have the callback: void *virtio_balloon_report_unused_pages(void *opaque, unsigned long pfn, unsigned long nr_pages) { struct virtio_balloon *vb = (struct virtio_balloon *)opaque; ...put the free page block to the the ring of vb; } What do you think? Best, Wei From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48991) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1daKhL-0005Pc-6w for qemu-devel@nongnu.org; Wed, 26 Jul 2017 07:41:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1daKhH-0004Br-W2 for qemu-devel@nongnu.org; Wed, 26 Jul 2017 07:41:51 -0400 Received: from mga09.intel.com ([134.134.136.24]:59834) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1daKhH-0004B5-Kp for qemu-devel@nongnu.org; Wed, 26 Jul 2017 07:41:47 -0400 Message-ID: <59788097.6010402@intel.com> Date: Wed, 26 Jul 2017 19:44:23 +0800 From: Wei Wang MIME-Version: 1.0 References: <20170719081311.GC26779@dhcp22.suse.cz> <596F4A0E.4010507@intel.com> <20170724090042.GF25221@dhcp22.suse.cz> <59771010.6080108@intel.com> <20170725112513.GD26723@dhcp22.suse.cz> <597731E8.9040803@intel.com> <20170725124141.GF26723@dhcp22.suse.cz> <286AC319A985734F985F78AFA26841F739283F62@shsmsx102.ccr.corp.intel.com> <20170725145333.GK26723@dhcp22.suse.cz> <5977FCDF.7040606@intel.com> <20170726102458.GH2981@dhcp22.suse.cz> In-Reply-To: <20170726102458.GH2981@dhcp22.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v12 6/8] mm: support reporting free page blocks List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michal Hocko Cc: "Michael S. Tsirkin" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "akpm@linux-foundation.org" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "liliang.opensource@gmail.com" , "virtio-dev@lists.oasis-open.org" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" On 07/26/2017 06:24 PM, Michal Hocko wrote: > On Wed 26-07-17 10:22:23, Wei Wang wrote: >> On 07/25/2017 10:53 PM, Michal Hocko wrote: >>> On Tue 25-07-17 14:47:16, Wang, Wei W wrote: >>>> On Tuesday, July 25, 2017 8:42 PM, hal Hocko wrote: >>>>> On Tue 25-07-17 19:56:24, Wei Wang wrote: >>>>>> On 07/25/2017 07:25 PM, Michal Hocko wrote: >>>>>>> On Tue 25-07-17 17:32:00, Wei Wang wrote: >>>>>>>> On 07/24/2017 05:00 PM, Michal Hocko wrote: >>>>>>>>> On Wed 19-07-17 20:01:18, Wei Wang wrote: >>>>>>>>>> On 07/19/2017 04:13 PM, Michal Hocko wrote: >>>>>>>>> [... >>>>>> We don't need to do the pfn walk in the guest kernel. When the API >>>>>> reports, for example, a 2MB free page block, the API caller offers to >>>>>> the hypervisor the base address of the page block, and size=2MB, to >>>>>> the hypervisor. >>>>> So you want to skip pfn walks by regularly calling into the page allocator to >>>>> update your bitmap. If that is the case then would an API that would allow you >>>>> to update your bitmap via a callback be s sufficient? Something like >>>>> void walk_free_mem(int node, int min_order, >>>>> void (*visit)(unsigned long pfn, unsigned long nr_pages)) >>>>> >>>>> The function will call the given callback for each free memory block on the given >>>>> node starting from the given min_order. The callback will be strictly an atomic >>>>> and very light context. You can update your bitmap from there. >>>> I would need to introduce more about the background here: >>>> The hypervisor and the guest live in their own address space. The hypervisor's bitmap >>>> isn't seen by the guest. I think we also wouldn't be able to give a callback function >>> >from the hypervisor to the guest in this case. >>> How did you plan to use your original API which export struct page array >>> then? >> >> That's where the virtio-balloon driver comes in. It uses a shared ring >> mechanism to >> send the guest memory info to the hypervisor. >> >> We didn't expose the struct page array from the guest to the hypervisor. For >> example, when >> a 2MB free page block is reported from the free page list, the info put on >> the ring is just >> (base address of the 2MB continuous memory, size=2M). > So what exactly prevents virtio-balloon from using the above proposed > callback mechanism and export what is needed to the hypervisor? I thought about it more. Probably we can use the callback function with a little change like this: void walk_free_mem(void *opaque1, void (*visit)(void *opaque2, unsigned long pfn, unsigned long nr_pages)) { ... for_each_populated_zone(zone) { for_each_migratetype_order(order, type) { report_unused_page_block(zone, order, type, &page); // from patch 6 pfn = page_to_pfn(page); visit(opaque1, pfn, 1 << order); } } } The above function scans all the free list and directly sends each free page block to the hypervisor via the virtio_balloon callback below. No need to implement a bitmap. In virtio-balloon, we have the callback: void *virtio_balloon_report_unused_pages(void *opaque, unsigned long pfn, unsigned long nr_pages) { struct virtio_balloon *vb = (struct virtio_balloon *)opaque; ...put the free page block to the the ring of vb; } What do you think? Best, Wei