From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG Date: Sun, 30 Jul 2017 19:20:47 +0300 Message-ID: <20170730191911-mutt-send-email-mst@kernel.org> References: <59686EEB.8080805@intel.com> <20170723044036-mutt-send-email-mst@kernel.org> <59781119.8010200@intel.com> <20170726155856-mutt-send-email-mst@kernel.org> <597954E3.2070801@intel.com> <20170729020231-mutt-send-email-mst@kernel.org> <597C83CC.7060702@intel.com> <20170730043922-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F739288D85@shsmsx102.ccr.corp.intel.com> <20170730191735-mutt-send-email-mst@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "akpm@linux-foundation.org" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "liliang.opensource@gmail.com" , "virtio-dev@lists.oasis-open.org" , "yang To: "Wang, Wei W" Return-path: Content-Disposition: inline In-Reply-To: <20170730191735-mutt-send-email-mst@kernel.org> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Sun, Jul 30, 2017 at 07:18:33PM +0300, Michael S. Tsirkin wrote: > On Sun, Jul 30, 2017 at 05:59:17AM +0000, Wang, Wei W wrote: > > On Sunday, July 30, 2017 12:23 PM, Michael S. Tsirkin wrote: > > > On Sat, Jul 29, 2017 at 08:47:08PM +0800, Wei Wang wrote: > > > > On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote: > > > > > On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote: > > > > > > > > > OK I thought this over. While we might need these new APIs > > > > > > > > > in the future, I think that at the moment, there's a way to > > > > > > > > > implement this feature that is significantly simpler. Just > > > > > > > > > add each s/g as a separate input buffer. > > > > > > > > Should it be an output buffer? > > > > > > > Hypervisor overwrites these pages with zeroes. Therefore it is > > > > > > > writeable by device: DMA_FROM_DEVICE. > > > > > > Why would the hypervisor need to zero the buffer? > > > > > The page is supplied to hypervisor and can lose the value that is > > > > > there. That is the definition of writeable by device. > > > > > > > > I think for the free pages, it should be clear that they will be added > > > > as output buffer to the device, because (as we discussed) they are > > > > just hints, and some of them may be used by the guest after the report_ API is > > > invoked. > > > > The device/hypervisor should not use or discard them. > > > > > > Discarding contents is exactly what you propose doing if migration is going on, > > > isn't it? > > > > That's actually a different concept. Please let me explain it with this example: > > > > The hypervisor receives the hint saying the guest PageX is a free page, but as we know, > > after that report_ API exits, the guest kernel may take PageX to use, so PageX is not free > > page any more. At this time, if the hypervisor writes to the page, that would crash the guest. > > So, I think the cornerstone of this work is that the hypervisor should not touch the > > reported pages. > > > > Best, > > Wei > > That's a hypervisor implementation detail. From guest point of view, > discarding contents can not be distinguished from writing old contents. > Besides, ignoring the free page tricks, consider regular ballooning. We map page with DONTNEED then back with WILLNEED. Result is getting a zero page. So at least one of deflate/inflate should be input. I'd say both for symmetry. -- MST From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f197.google.com (mail-qt0-f197.google.com [209.85.216.197]) by kanga.kvack.org (Postfix) with ESMTP id B56B56B05B5 for ; Sun, 30 Jul 2017 12:20:55 -0400 (EDT) Received: by mail-qt0-f197.google.com with SMTP id u11so71564391qtu.10 for ; Sun, 30 Jul 2017 09:20:55 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id q34si15833166qtd.284.2017.07.30.09.20.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 30 Jul 2017 09:20:55 -0700 (PDT) Date: Sun, 30 Jul 2017 19:20:47 +0300 From: "Michael S. Tsirkin" Subject: Re: [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG Message-ID: <20170730191911-mutt-send-email-mst@kernel.org> References: <59686EEB.8080805@intel.com> <20170723044036-mutt-send-email-mst@kernel.org> <59781119.8010200@intel.com> <20170726155856-mutt-send-email-mst@kernel.org> <597954E3.2070801@intel.com> <20170729020231-mutt-send-email-mst@kernel.org> <597C83CC.7060702@intel.com> <20170730043922-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F739288D85@shsmsx102.ccr.corp.intel.com> <20170730191735-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170730191735-mutt-send-email-mst@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: "Wang, Wei W" Cc: "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "akpm@linux-foundation.org" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "liliang.opensource@gmail.com" , "virtio-dev@lists.oasis-open.org" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" On Sun, Jul 30, 2017 at 07:18:33PM +0300, Michael S. Tsirkin wrote: > On Sun, Jul 30, 2017 at 05:59:17AM +0000, Wang, Wei W wrote: > > On Sunday, July 30, 2017 12:23 PM, Michael S. Tsirkin wrote: > > > On Sat, Jul 29, 2017 at 08:47:08PM +0800, Wei Wang wrote: > > > > On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote: > > > > > On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote: > > > > > > > > > OK I thought this over. While we might need these new APIs > > > > > > > > > in the future, I think that at the moment, there's a way to > > > > > > > > > implement this feature that is significantly simpler. Just > > > > > > > > > add each s/g as a separate input buffer. > > > > > > > > Should it be an output buffer? > > > > > > > Hypervisor overwrites these pages with zeroes. Therefore it is > > > > > > > writeable by device: DMA_FROM_DEVICE. > > > > > > Why would the hypervisor need to zero the buffer? > > > > > The page is supplied to hypervisor and can lose the value that is > > > > > there. That is the definition of writeable by device. > > > > > > > > I think for the free pages, it should be clear that they will be added > > > > as output buffer to the device, because (as we discussed) they are > > > > just hints, and some of them may be used by the guest after the report_ API is > > > invoked. > > > > The device/hypervisor should not use or discard them. > > > > > > Discarding contents is exactly what you propose doing if migration is going on, > > > isn't it? > > > > That's actually a different concept. Please let me explain it with this example: > > > > The hypervisor receives the hint saying the guest PageX is a free page, but as we know, > > after that report_ API exits, the guest kernel may take PageX to use, so PageX is not free > > page any more. At this time, if the hypervisor writes to the page, that would crash the guest. > > So, I think the cornerstone of this work is that the hypervisor should not touch the > > reported pages. > > > > Best, > > Wei > > That's a hypervisor implementation detail. From guest point of view, > discarding contents can not be distinguished from writing old contents. > Besides, ignoring the free page tricks, consider regular ballooning. We map page with DONTNEED then back with WILLNEED. Result is getting a zero page. So at least one of deflate/inflate should be input. I'd say both for symmetry. -- MST -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754279AbdG3QU4 (ORCPT ); Sun, 30 Jul 2017 12:20:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34002 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754108AbdG3QUy (ORCPT ); Sun, 30 Jul 2017 12:20:54 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 2A5B2C0FB3CC Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mst@redhat.com Date: Sun, 30 Jul 2017 19:20:47 +0300 From: "Michael S. Tsirkin" To: "Wang, Wei W" Cc: "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "akpm@linux-foundation.org" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "liliang.opensource@gmail.com" , "virtio-dev@lists.oasis-open.org" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" Subject: Re: [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG Message-ID: <20170730191911-mutt-send-email-mst@kernel.org> References: <59686EEB.8080805@intel.com> <20170723044036-mutt-send-email-mst@kernel.org> <59781119.8010200@intel.com> <20170726155856-mutt-send-email-mst@kernel.org> <597954E3.2070801@intel.com> <20170729020231-mutt-send-email-mst@kernel.org> <597C83CC.7060702@intel.com> <20170730043922-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F739288D85@shsmsx102.ccr.corp.intel.com> <20170730191735-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170730191735-mutt-send-email-mst@kernel.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Sun, 30 Jul 2017 16:20:54 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jul 30, 2017 at 07:18:33PM +0300, Michael S. Tsirkin wrote: > On Sun, Jul 30, 2017 at 05:59:17AM +0000, Wang, Wei W wrote: > > On Sunday, July 30, 2017 12:23 PM, Michael S. Tsirkin wrote: > > > On Sat, Jul 29, 2017 at 08:47:08PM +0800, Wei Wang wrote: > > > > On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote: > > > > > On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote: > > > > > > > > > OK I thought this over. While we might need these new APIs > > > > > > > > > in the future, I think that at the moment, there's a way to > > > > > > > > > implement this feature that is significantly simpler. Just > > > > > > > > > add each s/g as a separate input buffer. > > > > > > > > Should it be an output buffer? > > > > > > > Hypervisor overwrites these pages with zeroes. Therefore it is > > > > > > > writeable by device: DMA_FROM_DEVICE. > > > > > > Why would the hypervisor need to zero the buffer? > > > > > The page is supplied to hypervisor and can lose the value that is > > > > > there. That is the definition of writeable by device. > > > > > > > > I think for the free pages, it should be clear that they will be added > > > > as output buffer to the device, because (as we discussed) they are > > > > just hints, and some of them may be used by the guest after the report_ API is > > > invoked. > > > > The device/hypervisor should not use or discard them. > > > > > > Discarding contents is exactly what you propose doing if migration is going on, > > > isn't it? > > > > That's actually a different concept. Please let me explain it with this example: > > > > The hypervisor receives the hint saying the guest PageX is a free page, but as we know, > > after that report_ API exits, the guest kernel may take PageX to use, so PageX is not free > > page any more. At this time, if the hypervisor writes to the page, that would crash the guest. > > So, I think the cornerstone of this work is that the hypervisor should not touch the > > reported pages. > > > > Best, > > Wei > > That's a hypervisor implementation detail. From guest point of view, > discarding contents can not be distinguished from writing old contents. > Besides, ignoring the free page tricks, consider regular ballooning. We map page with DONTNEED then back with WILLNEED. Result is getting a zero page. So at least one of deflate/inflate should be input. I'd say both for symmetry. -- MST From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48905) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dbqxe-0006Gm-LW for qemu-devel@nongnu.org; Sun, 30 Jul 2017 12:20:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dbqxb-0005Fa-If for qemu-devel@nongnu.org; Sun, 30 Jul 2017 12:20:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40214) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dbqxb-0005FK-8u for qemu-devel@nongnu.org; Sun, 30 Jul 2017 12:20:55 -0400 Date: Sun, 30 Jul 2017 19:20:47 +0300 From: "Michael S. Tsirkin" Message-ID: <20170730191911-mutt-send-email-mst@kernel.org> References: <59686EEB.8080805@intel.com> <20170723044036-mutt-send-email-mst@kernel.org> <59781119.8010200@intel.com> <20170726155856-mutt-send-email-mst@kernel.org> <597954E3.2070801@intel.com> <20170729020231-mutt-send-email-mst@kernel.org> <597C83CC.7060702@intel.com> <20170730043922-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F739288D85@shsmsx102.ccr.corp.intel.com> <20170730191735-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170730191735-mutt-send-email-mst@kernel.org> Subject: Re: [Qemu-devel] [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Wang, Wei W" Cc: "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "akpm@linux-foundation.org" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "liliang.opensource@gmail.com" , "virtio-dev@lists.oasis-open.org" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" On Sun, Jul 30, 2017 at 07:18:33PM +0300, Michael S. Tsirkin wrote: > On Sun, Jul 30, 2017 at 05:59:17AM +0000, Wang, Wei W wrote: > > On Sunday, July 30, 2017 12:23 PM, Michael S. Tsirkin wrote: > > > On Sat, Jul 29, 2017 at 08:47:08PM +0800, Wei Wang wrote: > > > > On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote: > > > > > On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote: > > > > > > > > > OK I thought this over. While we might need these new APIs > > > > > > > > > in the future, I think that at the moment, there's a way to > > > > > > > > > implement this feature that is significantly simpler. Just > > > > > > > > > add each s/g as a separate input buffer. > > > > > > > > Should it be an output buffer? > > > > > > > Hypervisor overwrites these pages with zeroes. Therefore it is > > > > > > > writeable by device: DMA_FROM_DEVICE. > > > > > > Why would the hypervisor need to zero the buffer? > > > > > The page is supplied to hypervisor and can lose the value that is > > > > > there. That is the definition of writeable by device. > > > > > > > > I think for the free pages, it should be clear that they will be added > > > > as output buffer to the device, because (as we discussed) they are > > > > just hints, and some of them may be used by the guest after the report_ API is > > > invoked. > > > > The device/hypervisor should not use or discard them. > > > > > > Discarding contents is exactly what you propose doing if migration is going on, > > > isn't it? > > > > That's actually a different concept. Please let me explain it with this example: > > > > The hypervisor receives the hint saying the guest PageX is a free page, but as we know, > > after that report_ API exits, the guest kernel may take PageX to use, so PageX is not free > > page any more. At this time, if the hypervisor writes to the page, that would crash the guest. > > So, I think the cornerstone of this work is that the hypervisor should not touch the > > reported pages. > > > > Best, > > Wei > > That's a hypervisor implementation detail. From guest point of view, > discarding contents can not be distinguished from writing old contents. > Besides, ignoring the free page tricks, consider regular ballooning. We map page with DONTNEED then back with WILLNEED. Result is getting a zero page. So at least one of deflate/inflate should be input. I'd say both for symmetry. -- MST