From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C024BC3524A for ; Tue, 4 Feb 2020 05:45:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4ADA92084E for ; Tue, 4 Feb 2020 05:45:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cpeI27UJ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4ADA92084E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A9DBC6B0003; Tue, 4 Feb 2020 00:45:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A4D176B0005; Tue, 4 Feb 2020 00:45:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93B786B0006; Tue, 4 Feb 2020 00:45:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id 7DC416B0003 for ; Tue, 4 Feb 2020 00:45:51 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 314E58248047 for ; Tue, 4 Feb 2020 05:45:51 +0000 (UTC) X-FDA: 76451358102.26.shade87_15e44a02b6922 X-HE-Tag: shade87_15e44a02b6922 X-Filterd-Recvd-Size: 11720 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Tue, 4 Feb 2020 05:45:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1580795149; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WlG4G7roNEqaJVq/pJdBXCowC0/0FV1OoDBYCfkRuxA=; b=cpeI27UJP6imL4Rd5voueHCXA0SdXYCif0jehL2pWec8zImJz2rXMD/nATb3NStaR3/8n1 zjmLgnCl/rD4mNnb+sYFdOYFtlk+wwIO82vNyT+qzzWoOUjMXGGXv+6X1plj9/Fft++7Go dHBn6SlIKY4NSkWJB2FZp/hWClWLCsE= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-394-jqOgl2sOMBGlQAgz1zEoIA-1; Tue, 04 Feb 2020 00:45:44 -0500 Received: by mail-qv1-f72.google.com with SMTP id d7so10996496qvq.12 for ; Mon, 03 Feb 2020 21:45:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=egHlU2U6PFBbVwWy/xLdnjrB42lh1nfso6qhypFl344=; b=Bijc0JJfmzWsZdTW5q32D3bE9cPSdExzobNQ13g0HIXgJ5hkcC1zGK+R0Y1xAdnFRF ueCHxSb90GnEp9oioLlVZRzK/jNdh4D0ruLaJwluQGLD95iemREarqKWh0L3MQxMfHLw V9+WtzdH7x6YBgoeUdnT7ngVI3giNDqi1PGYJMVucEkVgvWSbwPrr9USJK5PZDwsH0h1 YXy+f7QuBIoiEIZ+NxoiUou1aR+J1Sser+TiglRskwcxmMjFHZdYX2JAcAewWD0z9Ku1 TVMa2Qdhhz4dyaWg5OXoMth74R9pOF9z4eLOUeY1vkuJC3DVTG6O6pu7aGwKD0kGFvIM ZfXg== X-Gm-Message-State: APjAAAWu45A7QSfn82U34or/YihY89rAnAzlo/YJcYbFa2p7Kq2DmUPT t2XH0n2VEjAEyYU6PZyDbqdfy+6oocbcYwZd1PXLCiEC8f556LynXf1hSddHlgb8z0QsONTRevn +1Tw/Q3mkqD8= X-Received: by 2002:ac8:187b:: with SMTP id n56mr27475167qtk.173.1580795143631; Mon, 03 Feb 2020 21:45:43 -0800 (PST) X-Google-Smtp-Source: APXvYqxRHrFPrkMh9a6LmIqQqRMwZ+W30YdrU5OZ5qv0+GeKOGzk9JeDHM64BA2LhRRKSywft+wsqQ== X-Received: by 2002:ac8:187b:: with SMTP id n56mr27475153qtk.173.1580795143326; Mon, 03 Feb 2020 21:45:43 -0800 (PST) Received: from redhat.com (bzq-109-64-11-187.red.bezeqint.net. [109.64.11.187]) by smtp.gmail.com with ESMTPSA id r13sm10982499qtt.70.2020.02.03.21.45.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2020 21:45:42 -0800 (PST) Date: Tue, 4 Feb 2020 00:45:37 -0500 From: "Michael S. Tsirkin" To: Tyler Sanderson Cc: David Hildenbrand , Alexander Duyck , "Wang, Wei W" , "virtualization@lists.linux-foundation.org" , David Rientjes , "linux-mm@kvack.org" , Michal Hocko Subject: Re: Balloon pressuring page cache Message-ID: <20200204004302-mutt-send-email-mst@kernel.org> References: <91270a68-ff48-88b0-219c-69801f0c252f@redhat.com> <75d4594f-0864-5172-a0f8-f97affedb366@redhat.com> <286AC319A985734F985F78AFA26841F73E3F8A02@shsmsx102.ccr.corp.intel.com> <20200203080520-mutt-send-email-mst@kernel.org> <5ac131de8e3b7fc1fafd05a61feb5f6889aeb917.camel@linux.intel.com> <20200203120225-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 In-Reply-To: X-MC-Unique: jqOgl2sOMBGlQAgz1zEoIA-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 03, 2020 at 12:32:05PM -0800, Tyler Sanderson wrote: > There were apparently good reasons for moving away from OOM notifier call= back: > https://lkml.org/lkml/2018/7/12/314 > https://lkml.org/lkml/2018/8/2/322 >=20 > In particular the OOM notifier is worse than the shrinker because: >=20 > 1. It is last-resort, which means the system has already gone through he= roics > to prevent OOM. Those heroic reclaim efforts are expensive and impact > application performance. > 2. It lacks understanding of NUMA or other OOM constraints. > 3. It has a higher potential for bugs due to the subtlety=C2=A0of the ca= llback > context. >=20 > Given the above, I think the shrinker=C2=A0API certainly makes the most s= ense _if_ > the balloon size is static. In that case memory should be reclaimed from = the > balloon early and proportionally to balloon size, which the shrinker API > achieves. OK that sounds like VIRTIO_BALLOON_F_FREE_PAGE_HINT then. > However, if the balloon is inflating and intentionally causing memory pre= ssure > then this results in the inefficiency pointed out earlier. And that sounds like VIRTIO_BALLOON_F_DEFLATE_ON_OOM. > If the balloon is inflating but not causing memory pressure then there is= no > problem with either API. >=20 > This suggests another route: rather than cause memory pressure to shrink = the > page cache, the balloon could issue the equivalent=C2=A0of "echo 3 > /pro= c/sys/vm/ > drop_caches". > Of course ideally, we want to be more fine grained than "drop everything"= . We > really want an API that says "drop everything that hasn't been accessed i= n the > last 5 minutes". >=20 > This would eliminate the need for the balloon to cause memory pressure at= all > which=C2=A0avoids the inefficiency in question. Furthermore, this pairs n= icely with > the FREE_PAGE_HINT feature. Well we still do have a regression. So we probably should revert for now, and separately look for better solutions. >=20 > On Mon, Feb 3, 2020 at 9:04 AM Michael S. Tsirkin wrote: >=20 > On Mon, Feb 03, 2020 at 05:34:20PM +0100, David Hildenbrand wrote: > > On 03.02.20 17:18, Alexander Duyck wrote: > > > On Mon, 2020-02-03 at 08:11 -0500, Michael S. Tsirkin wrote: > > >> On Thu, Jan 30, 2020 at 11:59:46AM -0800, Tyler Sanderson wrote: > > >>> > > >>> On Thu, Jan 30, 2020 at 7:31 AM Wang, Wei W > wrote: > > >>> > > >>>=C2=A0 =C2=A0 =C2=A0On Thursday, January 30, 2020 11:03 PM, Davi= d Hildenbrand wrote: > > >>>=C2=A0 =C2=A0 =C2=A0> On 29.01.20 20:11, Tyler Sanderson wrote: > > >>>=C2=A0 =C2=A0 =C2=A0> > > > >>>=C2=A0 =C2=A0 =C2=A0> > > > >>>=C2=A0 =C2=A0 =C2=A0> > On Wed, Jan 29, 2020 at 2:31 AM David Hi= ldenbrand < > david@redhat.com > > >>>=C2=A0 =C2=A0 =C2=A0> > > wrote: > > >>>=C2=A0 =C2=A0 =C2=A0> > > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0On 29.01.20 01:22, Ty= ler Sanderson via Virtualization > wrote: > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> A primary advantage= of virtio balloon over other memory > reclaim > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> mechanisms is that = it can pressure the guest's page > cache into > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0shrinking. > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> However, since the = balloon driver changed to using the > shrinker > > >>>=C2=A0 =C2=A0 =C2=A0API > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> > > >>>=C2=A0 =C2=A0 =C2=A0> > > > >>>=C2=A0 =C2=A0 =C2=A0> 71994620bb25a8b109388fefa9 > > >>>=C2=A0 =C2=A0 =C2=A0> e99a28e355255a#diff-fd202acf694d9eba19c8c6= 4da3e480c9> this > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> use case has become= a bit more tricky. I'm wondering > what the > > >>>=C2=A0 =C2=A0 =C2=A0> intended > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> device implementati= on is. > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> When inflating the = balloon against page cache (i.e. no > free > > >>>=C2=A0 =C2=A0 =C2=A0memory > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> remains) vmscan.c w= ill both shrink page cache, but also > invoke > > >>>=C2=A0 =C2=A0 =C2=A0the > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> shrinkers -- includ= ing the balloon's shrinker. So the > balloon > > >>>=C2=A0 =C2=A0 =C2=A0driver > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> allocates memory wh= ich requires reclaim, vmscan gets > this memory > > >>>=C2=A0 =C2=A0 =C2=A0> by > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> shrinking the ballo= on, and then the driver adds the > memory back > > >>>=C2=A0 =C2=A0 =C2=A0to > > >>>=C2=A0 =C2=A0 =C2=A0> the > > >>>=C2=A0 =C2=A0 =C2=A0> >=C2=A0 =C2=A0 =C2=A0> balloon. Basically = a busy no-op. > > >>> > > >>>=C2=A0 =C2=A0 =C2=A0Per my understanding, the balloon allocation= won=E2=80=99t invoke > shrinker as > > >>>=C2=A0 =C2=A0 =C2=A0__GFP_DIRECT_RECLAIM isn't set, no? > > >>> > > >>> I could be wrong about the mechanism, but the device sees lots = of > activity on > > >>> the deflate queue. The balloon is being shrunk. And this only s= tarts > once all > > >>> free memory is depleted and we're inflating into page cache. > > >> > > >> So given this looks like a regression, maybe we should revert th= e > > >> patch in question 71994620bb25 ("virtio_balloon: replace oom not= ifier > with shrinker") > > >> Besides, with VIRTIO_BALLOON_F_FREE_PAGE_HINT > > >> shrinker also ignores VIRTIO_BALLOON_F_MUST_TELL_HOST which isn'= t nice > > >> at all. > > >> > > >> So it looks like all this rework introduced more issues than it > > >> addressed ... > > >> > > >> I also CC Alex Duyck for an opinion on this. > > >> Alex, what do you use to put pressure on page cache? > > > > > > I would say reverting probably makes sense. I'm not sure there is= much > > > value to having a shrinker running deflation when you are activel= y > trying > > > to increase the balloon. It would make more sense to wait until y= ou are > > > actually about to start hitting oom. > > > > I think the shrinker makes sense for free page hinting feature > > (everything on free_page_list). > > > > So instead of only reverting, I think we should split it up and alw= ays > > register the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT and the O= OM > > notifier (as before) for VIRTIO_BALLOON_F_MUST_TELL_HOST. >=20 > OK ... I guess that means we need to fix shrinker to take > VIRTIO_BALLOON_F_MUST_TELL_HOST into account correctly. > Hosts ignore it at the moment but it's a fragile thing > to do what it does and ignore used buffers. >=20 > > (Of course, adapting what is being done in the shrinker and in the = OOM > > notifier) > > > > -- > > Thanks, > > > > David / dhildenb >=20 >=20