From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751901AbdHAPi5 (ORCPT ); Tue, 1 Aug 2017 11:38:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53436 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751670AbdHAPi4 (ORCPT ); Tue, 1 Aug 2017 11:38:56 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com ECFF815954F Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mst@redhat.com Date: Tue, 1 Aug 2017 18:38:54 +0300 From: "Michael S. Tsirkin" To: Michal Hocko Cc: ZhenweiPi , Wei Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org, mawilcox@microsoft.com, dave.hansen@intel.com, akpm@linux-foundation.org Subject: Re: [PATCH] mm: don't zero ballooned pages Message-ID: <20170801183518-mutt-send-email-mst@kernel.org> References: <1501474413-21580-1-git-send-email-wei.w.wang@intel.com> <20170731065508.GE13036@dhcp22.suse.cz> <597EDF3D.8020101@intel.com> <20170731075153.GD15767@dhcp22.suse.cz> <32d9c53d-5310-25a7-0348-a6cf362a5dcd@youruncloud.com> <20170731083724.GF15767@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170731083724.GF15767@dhcp22.suse.cz> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 01 Aug 2017 15:38:56 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 31, 2017 at 10:37:24AM +0200, Michal Hocko wrote: > On Mon 31-07-17 16:23:26, ZhenweiPi wrote: > > On 07/31/2017 03:51 PM, Michal Hocko wrote: > > > > >On Mon 31-07-17 15:41:49, Wei Wang wrote: > > >>>On 07/31/2017 02:55 PM, Michal Hocko wrote: > > >>>> >On Mon 31-07-17 12:13:33, Wei Wang wrote: > > >>>>> >>Ballooned pages will be marked as MADV_DONTNEED by the hypervisor and > > >>>>> >>shouldn't be given to the host ksmd to scan. > > >>>> >Could you point me where this MADV_DONTNEED is done, please? > > >>> > > >>>Sure. It's done in the hypervisor when the balloon pages are received. > > >>> > > >>>Please see line 40 at > > >>>https://github.com/qemu/qemu/blob/master/hw/virtio/virtio-balloon.c > > >And one more thing. I am not familiar with ksm much. But how is > > >MADV_DONTNEED even helping? This madvise is not sticky - aka it will > > >unmap the range without leaving any note behind. AFAICS the only way > > >to have vma scanned is to have VM_MERGEABLE and that is an opt in: > > >See Documentation/vm/ksm.txt > > >" > > >KSM only operates on those areas of address space which an application > > >has advised to be likely candidates for merging, by using the madvise(2) > > >system call: int madvise(addr, length, MADV_MERGEABLE). > > >" > > > > > >So what exactly is going on here? The original patch looks highly > > >suspicious as well. If somebody wants to make that memory mergable then > > >the user of that memory should zero them out. > > > > Kernel starts a kthread named "ksmd". ksmd scans the VM_MERGEABLE > > memory, and merge the same pages.(same page means memcmp(page1, > > page2, PAGESIZE) == 0). > > > > Guest can not use ballooned pages, and these pages will not be accessed > > in a long time. Kswapd on host will swap these pages out and get more > > free memory. > > > > Rather than swapping, KSM has better performence. Presently pages in > > the balloon device have random value, they usually cannot be merged. > > So enqueue zero pages will resolve this problem. > > > > Because MADV_DONTNEED depends on host os capability and hypervisor capability, > > I prefer to enqueue zero pages to balloon device and made this patch. I think you should have hypervisor zero them out if it wants to then. Seems cleaner. > > So why exactly are we zeroying pages (and pay some cost for that) in > guest when we do not know what host actually does with them? I suspect this is some special hypervisor that somehow benefits from this patch. It should just use a feature bit for its special needs I think. Michal is also exactly right that patches like this should come with some performance numbers. I'll post a patch adding virtio lists for mm/balloon_compaction.c so that we notice when people tweak it like that. > -- > Michal Hocko > SUSE Labs