From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sean Christopherson Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses. Date: Thu, 12 May 2022 23:29:38 +0000 Message-ID: References: <20220429201131.3397875-1-yosryahmed@google.com> <20220429201131.3397875-2-yosryahmed@google.com> <87ilqoi77b.wl-maz@kernel.org> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=/iy7DvZiK8nafA5XKuXqSqcNqBCZjj9/qjUXXJFqGpQ=; b=cWZf8gG13SFYJAbym+C3jUQYUcEqHiEmBSGcMRkGeKtHJpoPVjCJpboc2DabP5dHl4 t2pGv+CgDWDeHxJP9tx6pCY9dSjjnguG4ekrfcrEUPi+LIH60qOBWqDT9YCyqkQV8C4/ dSaySBo05+qtz5foTJQ9KZOVML5d+2f0Q7XlTuBAcMKJyVliAc9S9Hsa6xjkW2Sp6kD0 1UgaKp+HBuU30LUTW23SGQf4o5bUWN8CesvUXR+7waPEQ995rJuJyEfEpi5J8hby+7Aj Cx1uoQ+692VdjLzjahYMHQzslw6xA739laCwIfxc6iaGRPPEF0WSqEC8fNXQchv8WP2x CzyQ== Content-Disposition: inline In-Reply-To: List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Johannes Weiner Cc: Yosry Ahmed , Marc Zyngier , Tejun Heo , Zefan Li , James Morse , Alexandru Elisei , Suzuki K Poulose , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Oliver Upton , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linux Kernel Mailing List , linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, kvmarm-FPEHb7Xf0XXUo1n7N8X6Ug@public.gmane.org On Thu, May 12, 2022, Johannes Weiner wrote: > Hey Yosry, > > On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote: > > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier wrote: > > > 115bae923ac8bb29ee635). You are saying that this is related to a > > > 'workload', but given that the accounting is global, I fail to see how > > > you can attribute these allocations on a particular VM. > > > > The main motivation is having the memcg stats, which give attribution > > to workloads. If you think it's more appropriate, we can add it as a > > memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add > > per-memcg vmalloc stat")). The only reason I made this as a global > > stat too is to be consistent with NR_PAGETABLE. > > Please no memcg-specific stats if a regular vmstat item is possible > and useful at the system level as well, like in this case. It's extra > memcg code, extra callbacks, and it doesn't have NUMA node awareness. > > > > What do you plan to do for IOMMU page tables? After all, they serve > > > the exact same purpose, and I'd expect these to be handled the same > > > way (i.e. why is this KVM specific?). > > > > The reason this was named NR_SECONDARY_PAGTABLE instead of > > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally > > account other types of secondary page tables to this stat. It is just > > that we are currently interested in the KVM MMU usage. > > Do you actually care at the supervisor level that this memory is used > for guest page tables? Hmm, yes? KVM does have a decent number of large-ish allocations that aren't for page tables, but except for page tables, the number/size of those allocations scales linearly with either the number of vCPUs or the amount of memory assigned to the VM (with no room for improvement barring KVM changes). Off the top of my head, KVM's secondary page tables are the only allocations that don't scale linearly, especially when nested virtualization is in use. > It seems to me you primarily care that it is reported *somewhere* > (hence the piggybacking off of NR_PAGETABLE at first). And whether > it's page tables or iommu tables or whatever else allocated for the > purpose of virtualization, it doesn't make much of a difference to the > host/cgroup that is tracking it, right? > > (The proximity to nr_pagetable could also be confusing. A high page > table count can be a hint to userspace to enable THP. It seems > actionable in a different way than a high number of kvm page tables or > iommu page tables.) I don't know about iommu page tables, but on the KVM side a high count can also be a good signal that enabling THP would be beneficial. It's definitely actionable in a different way though too. > How about NR_VIRT? It's shorter, seems descriptive enough, less room > for confusion, and is more easily extensible in the future. I don't like NR_VIRT because VFIO/iommu can be used for non-virtualization things, and we'd be lying by omission unless KVM (and other users) updates all of its large-ish allocations to account them correctly. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6C13C433EF for ; Thu, 12 May 2022 23:29:48 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 2DA0749F1F; Thu, 12 May 2022 19:29:48 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@google.com Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FqJ90idXFGAC; Thu, 12 May 2022 19:29:46 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id E54B849F14; Thu, 12 May 2022 19:29:46 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 49CB249F09 for ; Thu, 12 May 2022 19:29:45 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vYsxd+pL9pLD for ; Thu, 12 May 2022 19:29:44 -0400 (EDT) Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 0724B49F07 for ; Thu, 12 May 2022 19:29:43 -0400 (EDT) Received: by mail-pg1-f177.google.com with SMTP id r71so5597090pgr.0 for ; Thu, 12 May 2022 16:29:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=/iy7DvZiK8nafA5XKuXqSqcNqBCZjj9/qjUXXJFqGpQ=; b=cWZf8gG13SFYJAbym+C3jUQYUcEqHiEmBSGcMRkGeKtHJpoPVjCJpboc2DabP5dHl4 t2pGv+CgDWDeHxJP9tx6pCY9dSjjnguG4ekrfcrEUPi+LIH60qOBWqDT9YCyqkQV8C4/ dSaySBo05+qtz5foTJQ9KZOVML5d+2f0Q7XlTuBAcMKJyVliAc9S9Hsa6xjkW2Sp6kD0 1UgaKp+HBuU30LUTW23SGQf4o5bUWN8CesvUXR+7waPEQ995rJuJyEfEpi5J8hby+7Aj Cx1uoQ+692VdjLzjahYMHQzslw6xA739laCwIfxc6iaGRPPEF0WSqEC8fNXQchv8WP2x CzyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=/iy7DvZiK8nafA5XKuXqSqcNqBCZjj9/qjUXXJFqGpQ=; b=Pj+Hlbj4IyGeUjmhxa+K4igbOKlu7n0WW+rINn9HrGgL+Ij1vLr9MMuyzIk9wDZD2o F5wXWdR+UT4lVTmddozrQhttie2h4i9BROZswTUhNFW6z/90AWndw24bHfE8UOZ4G815 IXPknl9RSie+OA/I/QyLjHW5bKtpf7gXGR9P+zlQCYYXmiZ6VLhSYGpwBahNAXyckE7W D/CjnpgKIe8b0vqb9WXtLHAxkD4O0MTmoMcykhHQjKOAnoPDVHgxZHdm5vAiFLgBda3V trfjtkS+hJ44RyJkG+G9nlExMlvG68ZdsSXAU4nrq4ndzXqKSsV40o+SG+tSyd57l2ZW uiUQ== X-Gm-Message-State: AOAM532A/ddMkIz99LO51+VIf0vOZYxwQnREFNNgfd5f9qe9nDwPRCyV xLgPGG6xJ5pXmeR6IACk4EJcrw== X-Google-Smtp-Source: ABdhPJxxG6YUzPEUM5+g1vu/KK0sSWSbi9F/X2ppQRyP6w4zTtX0HAmpJ+CS2CQNyEouYPVZqp8oog== X-Received: by 2002:a63:d20e:0:b0:3db:5e25:26c with SMTP id a14-20020a63d20e000000b003db5e25026cmr1559604pgg.200.1652398182635; Thu, 12 May 2022 16:29:42 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id n2-20020a622702000000b0050dc76281e7sm333832pfn.193.2022.05.12.16.29.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 May 2022 16:29:42 -0700 (PDT) Date: Thu, 12 May 2022 23:29:38 +0000 From: Sean Christopherson To: Johannes Weiner Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses. Message-ID: References: <20220429201131.3397875-1-yosryahmed@google.com> <20220429201131.3397875-2-yosryahmed@google.com> <87ilqoi77b.wl-maz@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Cc: Wanpeng Li , kvm@vger.kernel.org, Roman Gushchin , Michal Hocko , Yosry Ahmed , Linux-MM , Zefan Li , kvmarm@lists.cs.columbia.edu, Marc Zyngier , Joerg Roedel , Shakeel Butt , cgroups@vger.kernel.org, Andrew Morton , linux-arm-kernel@lists.infradead.org, Jim Mattson , Linux Kernel Mailing List , Tejun Heo , Paolo Bonzini , Vitaly Kuznetsov X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Thu, May 12, 2022, Johannes Weiner wrote: > Hey Yosry, > > On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote: > > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier wrote: > > > 115bae923ac8bb29ee635). You are saying that this is related to a > > > 'workload', but given that the accounting is global, I fail to see how > > > you can attribute these allocations on a particular VM. > > > > The main motivation is having the memcg stats, which give attribution > > to workloads. If you think it's more appropriate, we can add it as a > > memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add > > per-memcg vmalloc stat")). The only reason I made this as a global > > stat too is to be consistent with NR_PAGETABLE. > > Please no memcg-specific stats if a regular vmstat item is possible > and useful at the system level as well, like in this case. It's extra > memcg code, extra callbacks, and it doesn't have NUMA node awareness. > > > > What do you plan to do for IOMMU page tables? After all, they serve > > > the exact same purpose, and I'd expect these to be handled the same > > > way (i.e. why is this KVM specific?). > > > > The reason this was named NR_SECONDARY_PAGTABLE instead of > > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally > > account other types of secondary page tables to this stat. It is just > > that we are currently interested in the KVM MMU usage. > > Do you actually care at the supervisor level that this memory is used > for guest page tables? Hmm, yes? KVM does have a decent number of large-ish allocations that aren't for page tables, but except for page tables, the number/size of those allocations scales linearly with either the number of vCPUs or the amount of memory assigned to the VM (with no room for improvement barring KVM changes). Off the top of my head, KVM's secondary page tables are the only allocations that don't scale linearly, especially when nested virtualization is in use. > It seems to me you primarily care that it is reported *somewhere* > (hence the piggybacking off of NR_PAGETABLE at first). And whether > it's page tables or iommu tables or whatever else allocated for the > purpose of virtualization, it doesn't make much of a difference to the > host/cgroup that is tracking it, right? > > (The proximity to nr_pagetable could also be confusing. A high page > table count can be a hint to userspace to enable THP. It seems > actionable in a different way than a high number of kvm page tables or > iommu page tables.) I don't know about iommu page tables, but on the KVM side a high count can also be a good signal that enabling THP would be beneficial. It's definitely actionable in a different way though too. > How about NR_VIRT? It's shorter, seems descriptive enough, less room > for confusion, and is more easily extensible in the future. I don't like NR_VIRT because VFIO/iommu can be used for non-virtualization things, and we'd be lying by omission unless KVM (and other users) updates all of its large-ish allocations to account them correctly. _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D4826C433EF for ; Thu, 12 May 2022 23:31:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=64F7gpkhQd37y9/MmGTICxFaV/R4hZ5by69rk/5iBHo=; b=jQ6yooNK5mg1yO AAUMmZuS4cQg9lxgWTnqpZ1mjzlHtMMZ03gbrbv/iW2yMSwk4Ulz61NoDZVSFBBKAwHnQOe4LB5ac h1gHBfzYxEvobIfo6/E0jjlOfbp5xXRocDmczfASnuIb/Vmo7liIgVFn3msZLOxsehrZZc1nLEh7L ERt4U6PUVbCZWQwB6vrf0tRJEFCyzSfMMMIU6CYAbxs2sgkLhMvt1L2sTvzUT5DIuEClkvPBo/QAV AdgUnNa+uuVsFpoeGLn27WaQJQWXrt1JcG28U0uyskKXnwX0wjXX7WgZ7s3FSmHWzidYdYqSqtx3T VAlZ8REOgJXA7x1dH8mA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1npIFh-00DrY5-8G; Thu, 12 May 2022 23:29:49 +0000 Received: from mail-pg1-x52d.google.com ([2607:f8b0:4864:20::52d]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1npIFd-00DrX7-NN for linux-arm-kernel@lists.infradead.org; Thu, 12 May 2022 23:29:47 +0000 Received: by mail-pg1-x52d.google.com with SMTP id v10so5906778pgl.11 for ; Thu, 12 May 2022 16:29:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=/iy7DvZiK8nafA5XKuXqSqcNqBCZjj9/qjUXXJFqGpQ=; b=cWZf8gG13SFYJAbym+C3jUQYUcEqHiEmBSGcMRkGeKtHJpoPVjCJpboc2DabP5dHl4 t2pGv+CgDWDeHxJP9tx6pCY9dSjjnguG4ekrfcrEUPi+LIH60qOBWqDT9YCyqkQV8C4/ dSaySBo05+qtz5foTJQ9KZOVML5d+2f0Q7XlTuBAcMKJyVliAc9S9Hsa6xjkW2Sp6kD0 1UgaKp+HBuU30LUTW23SGQf4o5bUWN8CesvUXR+7waPEQ995rJuJyEfEpi5J8hby+7Aj Cx1uoQ+692VdjLzjahYMHQzslw6xA739laCwIfxc6iaGRPPEF0WSqEC8fNXQchv8WP2x CzyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=/iy7DvZiK8nafA5XKuXqSqcNqBCZjj9/qjUXXJFqGpQ=; b=OH6nDAox5XoRcEHs4h0CsMw0OUMy82MAHB1hVf7zjQXmAYzRAMLQCM7i6SP6uU4WmL I6lTRjRKpY1KtxrKXSav3YUo33LqBGr8rhDcI/ZjiKtudPfJ8n+IMeLt1gFOKb2NtGPf 7+EVeU1brY1gdadIZlk4NihinPMDHksIwwYzZBPEKWeowq60OteJ504t+E/x2VXpzhvU jfcZ/J9lvQDvgWE6L9IeG/GoXpAw0KDaHecd4sW8ud72r4wGwV92A8QeCFP+w0M+9HnQ yCzmMJrpuYXFeIcay2Hk6FCVwg9vjkzvPnf4YabWuBF2lF4isL16iwOyOx482LunQ/5b A/hQ== X-Gm-Message-State: AOAM531bmqOz0Hj42sI9Hlay3PC1xohhfy4VkkD8CVLv5+fuNLZ7ReP0 eOfWEuOxEpdC+HWj3vLPv3anEA== X-Google-Smtp-Source: ABdhPJxxG6YUzPEUM5+g1vu/KK0sSWSbi9F/X2ppQRyP6w4zTtX0HAmpJ+CS2CQNyEouYPVZqp8oog== X-Received: by 2002:a63:d20e:0:b0:3db:5e25:26c with SMTP id a14-20020a63d20e000000b003db5e25026cmr1559604pgg.200.1652398182635; Thu, 12 May 2022 16:29:42 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id n2-20020a622702000000b0050dc76281e7sm333832pfn.193.2022.05.12.16.29.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 May 2022 16:29:42 -0700 (PDT) Date: Thu, 12 May 2022 23:29:38 +0000 From: Sean Christopherson To: Johannes Weiner Cc: Yosry Ahmed , Marc Zyngier , Tejun Heo , Zefan Li , James Morse , Alexandru Elisei , Suzuki K Poulose , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Oliver Upton , cgroups@vger.kernel.org, Linux Kernel Mailing List , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Linux-MM Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses. Message-ID: References: <20220429201131.3397875-1-yosryahmed@google.com> <20220429201131.3397875-2-yosryahmed@google.com> <87ilqoi77b.wl-maz@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220512_162945_824481_1F5CBA33 X-CRM114-Status: GOOD ( 31.14 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, May 12, 2022, Johannes Weiner wrote: > Hey Yosry, > > On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote: > > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier wrote: > > > 115bae923ac8bb29ee635). You are saying that this is related to a > > > 'workload', but given that the accounting is global, I fail to see how > > > you can attribute these allocations on a particular VM. > > > > The main motivation is having the memcg stats, which give attribution > > to workloads. If you think it's more appropriate, we can add it as a > > memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add > > per-memcg vmalloc stat")). The only reason I made this as a global > > stat too is to be consistent with NR_PAGETABLE. > > Please no memcg-specific stats if a regular vmstat item is possible > and useful at the system level as well, like in this case. It's extra > memcg code, extra callbacks, and it doesn't have NUMA node awareness. > > > > What do you plan to do for IOMMU page tables? After all, they serve > > > the exact same purpose, and I'd expect these to be handled the same > > > way (i.e. why is this KVM specific?). > > > > The reason this was named NR_SECONDARY_PAGTABLE instead of > > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally > > account other types of secondary page tables to this stat. It is just > > that we are currently interested in the KVM MMU usage. > > Do you actually care at the supervisor level that this memory is used > for guest page tables? Hmm, yes? KVM does have a decent number of large-ish allocations that aren't for page tables, but except for page tables, the number/size of those allocations scales linearly with either the number of vCPUs or the amount of memory assigned to the VM (with no room for improvement barring KVM changes). Off the top of my head, KVM's secondary page tables are the only allocations that don't scale linearly, especially when nested virtualization is in use. > It seems to me you primarily care that it is reported *somewhere* > (hence the piggybacking off of NR_PAGETABLE at first). And whether > it's page tables or iommu tables or whatever else allocated for the > purpose of virtualization, it doesn't make much of a difference to the > host/cgroup that is tracking it, right? > > (The proximity to nr_pagetable could also be confusing. A high page > table count can be a hint to userspace to enable THP. It seems > actionable in a different way than a high number of kvm page tables or > iommu page tables.) I don't know about iommu page tables, but on the KVM side a high count can also be a good signal that enabling THP would be beneficial. It's definitely actionable in a different way though too. > How about NR_VIRT? It's shorter, seems descriptive enough, less room > for confusion, and is more easily extensible in the future. I don't like NR_VIRT because VFIO/iommu can be used for non-virtualization things, and we'd be lying by omission unless KVM (and other users) updates all of its large-ish allocations to account them correctly. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB8ACC433F5 for ; Thu, 12 May 2022 23:29:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359478AbiELX3t (ORCPT ); Thu, 12 May 2022 19:29:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1359465AbiELX3p (ORCPT ); Thu, 12 May 2022 19:29:45 -0400 Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5368416A264 for ; Thu, 12 May 2022 16:29:43 -0700 (PDT) Received: by mail-pg1-x531.google.com with SMTP id h24so621158pgh.12 for ; Thu, 12 May 2022 16:29:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=/iy7DvZiK8nafA5XKuXqSqcNqBCZjj9/qjUXXJFqGpQ=; b=cWZf8gG13SFYJAbym+C3jUQYUcEqHiEmBSGcMRkGeKtHJpoPVjCJpboc2DabP5dHl4 t2pGv+CgDWDeHxJP9tx6pCY9dSjjnguG4ekrfcrEUPi+LIH60qOBWqDT9YCyqkQV8C4/ dSaySBo05+qtz5foTJQ9KZOVML5d+2f0Q7XlTuBAcMKJyVliAc9S9Hsa6xjkW2Sp6kD0 1UgaKp+HBuU30LUTW23SGQf4o5bUWN8CesvUXR+7waPEQ995rJuJyEfEpi5J8hby+7Aj Cx1uoQ+692VdjLzjahYMHQzslw6xA739laCwIfxc6iaGRPPEF0WSqEC8fNXQchv8WP2x CzyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=/iy7DvZiK8nafA5XKuXqSqcNqBCZjj9/qjUXXJFqGpQ=; b=oLE7lu+l4OOQcD+A0oHCXgyQiJJ7rYdRinrT2NOAfgOe1nkOgaKL+uG/jNWeKIhseq 9GyYl3oaXDSeDl740efNOhLXAkQpCuj+EnBUHwAQuflEM+CJPdrcGkcUcEZ6h4IgmPWi Ad14ZJUefsDIHl+sDdUdszMhrZ5vFr5AZYHEGSEzFti4uf5NWxjunlXahJDR3euiWLvj 8qzYc5+o3NKUlrFOp9SRhiIy8SM5qeZZpjwfoxBk8Hl0NDxSdzPTokeFqHmbY1zMPb+1 8R0kLDxv3YHVuDc9+y2lBIpl6Z12dWLcWZXEvnI4Q5leW3IeeJaNBr/Oyri3SYx89dgk oxiQ== X-Gm-Message-State: AOAM530R03kR76KgCIvMhjFL1RF1P8Q5O1Dowe7VHyuk/ha+ekWWcMMl IFY5FIcqm+QWiUozEcOqWtJCZw== X-Google-Smtp-Source: ABdhPJxxG6YUzPEUM5+g1vu/KK0sSWSbi9F/X2ppQRyP6w4zTtX0HAmpJ+CS2CQNyEouYPVZqp8oog== X-Received: by 2002:a63:d20e:0:b0:3db:5e25:26c with SMTP id a14-20020a63d20e000000b003db5e25026cmr1559604pgg.200.1652398182635; Thu, 12 May 2022 16:29:42 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id n2-20020a622702000000b0050dc76281e7sm333832pfn.193.2022.05.12.16.29.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 May 2022 16:29:42 -0700 (PDT) Date: Thu, 12 May 2022 23:29:38 +0000 From: Sean Christopherson To: Johannes Weiner Cc: Yosry Ahmed , Marc Zyngier , Tejun Heo , Zefan Li , James Morse , Alexandru Elisei , Suzuki K Poulose , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Oliver Upton , cgroups@vger.kernel.org, Linux Kernel Mailing List , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Linux-MM Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses. Message-ID: References: <20220429201131.3397875-1-yosryahmed@google.com> <20220429201131.3397875-2-yosryahmed@google.com> <87ilqoi77b.wl-maz@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, May 12, 2022, Johannes Weiner wrote: > Hey Yosry, > > On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote: > > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier wrote: > > > 115bae923ac8bb29ee635). You are saying that this is related to a > > > 'workload', but given that the accounting is global, I fail to see how > > > you can attribute these allocations on a particular VM. > > > > The main motivation is having the memcg stats, which give attribution > > to workloads. If you think it's more appropriate, we can add it as a > > memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add > > per-memcg vmalloc stat")). The only reason I made this as a global > > stat too is to be consistent with NR_PAGETABLE. > > Please no memcg-specific stats if a regular vmstat item is possible > and useful at the system level as well, like in this case. It's extra > memcg code, extra callbacks, and it doesn't have NUMA node awareness. > > > > What do you plan to do for IOMMU page tables? After all, they serve > > > the exact same purpose, and I'd expect these to be handled the same > > > way (i.e. why is this KVM specific?). > > > > The reason this was named NR_SECONDARY_PAGTABLE instead of > > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally > > account other types of secondary page tables to this stat. It is just > > that we are currently interested in the KVM MMU usage. > > Do you actually care at the supervisor level that this memory is used > for guest page tables? Hmm, yes? KVM does have a decent number of large-ish allocations that aren't for page tables, but except for page tables, the number/size of those allocations scales linearly with either the number of vCPUs or the amount of memory assigned to the VM (with no room for improvement barring KVM changes). Off the top of my head, KVM's secondary page tables are the only allocations that don't scale linearly, especially when nested virtualization is in use. > It seems to me you primarily care that it is reported *somewhere* > (hence the piggybacking off of NR_PAGETABLE at first). And whether > it's page tables or iommu tables or whatever else allocated for the > purpose of virtualization, it doesn't make much of a difference to the > host/cgroup that is tracking it, right? > > (The proximity to nr_pagetable could also be confusing. A high page > table count can be a hint to userspace to enable THP. It seems > actionable in a different way than a high number of kvm page tables or > iommu page tables.) I don't know about iommu page tables, but on the KVM side a high count can also be a good signal that enabling THP would be beneficial. It's definitely actionable in a different way though too. > How about NR_VIRT? It's shorter, seems descriptive enough, less room > for confusion, and is more easily extensible in the future. I don't like NR_VIRT because VFIO/iommu can be used for non-virtualization things, and we'd be lying by omission unless KVM (and other users) updates all of its large-ish allocations to account them correctly.