From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4E3FC48BDF for ; Mon, 21 Jun 2021 03:16:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63B9660231 for ; Mon, 21 Jun 2021 03:16:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63B9660231 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E99E46B0080; Sun, 20 Jun 2021 23:16:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E4A616B0081; Sun, 20 Jun 2021 23:16:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC4766B0082; Sun, 20 Jun 2021 23:16:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id 962796B0080 for ; Sun, 20 Jun 2021 23:16:05 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 335E4181AEF1A for ; Mon, 21 Jun 2021 03:16:05 +0000 (UTC) X-FDA: 78276267090.34.80C842E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf08.hostedemail.com (Postfix) with ESMTP id 939878019370 for ; Mon, 21 Jun 2021 03:16:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1624245364; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B2ArXRtPJQk6BlyRJnDmaMoLiN4BggSlTFGtO4czGec=; b=DSjqlKWhGaynEgp/Bv2rv6ToF3WL3YpOfy6rSSOnKb8Vo4aIGCITqEg4i3c0L1dABqL724 1pJqsSYdjboxdetmX+khJ+XRYROCROcf3zlLsOdVxjFCkOgqIgq/uOgdl0qF8qet/IiRZo mpNDJA9eO/7XQFtSXVV9S17NbcWg/Po= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-517-k9RcUUAnOGa-LzfWJ-xwQg-1; Sun, 20 Jun 2021 23:16:00 -0400 X-MC-Unique: k9RcUUAnOGa-LzfWJ-xwQg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 35F821084F57; Mon, 21 Jun 2021 03:15:59 +0000 (UTC) Received: from [10.64.54.84] (vpn2-54-84.bne.redhat.com [10.64.54.84]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8396C5D703; Mon, 21 Jun 2021 03:15:53 +0000 (UTC) Reply-To: Gavin Shan Subject: Re: [RFC PATCH] mm/page_reporting: Adjust threshold according to MAX_ORDER To: David Hildenbrand , Alexander Duyck Cc: linux-mm , LKML , Andrew Morton , shan.gavin@gmail.com, Anshuman Khandual References: <20210601033319.100737-1-gshan@redhat.com> <76516781-6a70-f2b0-f3e3-da999c84350f@redhat.com> <0c0eb8c8-463d-d6f1-3cec-bbc0af0a229c@redhat.com> <63c06446-3b10-762c-3a29-464854b74e08@redhat.com> <0cb302f1-7fb6-e47c-e138-b7a03f2b02e2@redhat.com> <33b441b2-f10d-a7fb-8163-df2afbf6527d@redhat.com> <9e553b30-ce18-df65-bd3c-c68eaa4d0d91@redhat.com> <3adbcad8-1016-cf48-4574-799de0bba6e4@redhat.com> <249e5814-e644-3d82-9b38-232928af4dbd@redhat.com> From: Gavin Shan Message-ID: <5ee628f8-772c-b1ed-557c-68d6a4a83415@redhat.com> Date: Mon, 21 Jun 2021 15:16:54 +1000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.0 MIME-Version: 1.0 In-Reply-To: <249e5814-e644-3d82-9b38-232928af4dbd@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DSjqlKWh; spf=none (imf08.hostedemail.com: domain of gshan@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=gshan@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: 3ic3tutoo4xyzsrqrpow5swb4cgb9g5s X-Rspamd-Queue-Id: 939878019370 X-Rspamd-Server: rspam06 X-HE-Tag: 1624245361-70287 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/16/21 10:07 PM, David Hildenbrand wrote: >> Indeed. 512MB pageblocks are rare, especially on systems which have be= en >> up and running for long time. >> >> The free page reporting starts from guest. Taking an extreme case: gue= st has >> 512MB memory and it's backed by one THP on host. The free page reporti= ng won't >> work at all. >> >> Besides, it seems free page reporting isn't guranteed to work all the = time. >> For example, on system where we have 4KB base page size. Freeing indiv= idual >> 4KB pages can't come up with a free 2MB pageblock due to fragmentation= . >> In this case, the free'd page won't be reported immediately, but might= be >> reported after swapping or compaction due to memory pressure. The free= page >> isn't reported immediately at least. >=20 > Exactly, it's a pure optimization that won't work, especially when gues= t memory is heavily fragmented. There has to be a balance between reclaim= ing free memory in the hypervisor, degrading VM performance, and overhead= of the feature. >=20 > Further, there are no guarantees when a VM will reuse the memory again.= In the worst case, all VMs that reported free pages reuse memory at the = same time. In that case, one definitely needs sufficient backend memory i= n the hypervisor (-> swap) to not run out of memory, and performance will= be degraded. >=20 > As MST once phrased it, if the feature has a higher overhead than swapp= ing in the hypervisor, it's of little use. >=20 Thanks for the explanation and sorry again for late response, David. I to= ok last week as holiday and didn't work too much. However, it's nice to have unused pages returned back to the host. These = pages can be used by other VMs or applications running on the host. >> >> David, how about taking your suggestion to have different threshold si= ze only >> for arm64 (64KB base page size). The threshold will be smaller than pa= geblock_order >> for sure. There are two ways to do so and please let me know which is = the preferred >> way to go if you (and Alex) agree to do it. >> >> (a) Introduce CONFIG_PAGE_REPORTING_ORDER for individual archs to choo= se the >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 value. The threshold falls back to page= block_order if isn't configurated. >> (b) Rename PAGE_REPORTING_MIN_ORDER to PAGE_REPORTING_ORDER. archs can= decide >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 its value. If it's not provided by arch= , it falls back to pageblock_order. >> >=20 > I wonder if we could further define it as a (module/cmdline) parameter = and make it configurable when booting. The default could then be set base= d on CONFIG_PAGE_REPORTING_ORDER. CONFIG_PAGE_REPORTING_ORDER would defau= lt to pageblock_order (if easily possible) and could be special-cases to = arm64 with 64k. >=20 The formal patches are posted for review. I used macro PAGE_REPORTING_ORD= ER instead of CONFIG_PAGE_REPORTING_ORDER. The page reporting order (thresho= ld) is also exported as a module parameter, as you suggested. >> By the way, I recently had some performance testing on different page = sizes. >> We get much more performance gain from 64KB (vs 4KB) page size in gues= t than >> 512MB (vs 2MB) THP on host. It means the performance won't be affected= too >> much even the 512MB THP is splitted on arm64 host. >=20 > Yes, if one is even able to get 512MB THP populated in the hypervisor -= - because once again, 512MB THP are just a bad fit for many workloads. >=20 Yeah, indeed :) Thanks, Gavin