From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAB66C4332F for ; Fri, 3 Nov 2023 01:07:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FC6A280009; Thu, 2 Nov 2023 21:07:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4AC9F8D000F; Thu, 2 Nov 2023 21:07:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34E62280009; Thu, 2 Nov 2023 21:07:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 22D348D000F for ; Thu, 2 Nov 2023 21:07:14 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D829E1CBDE3 for ; Fri, 3 Nov 2023 01:07:13 +0000 (UTC) X-FDA: 81414854346.02.838F2BC Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) by imf17.hostedemail.com (Postfix) with ESMTP id 0EAC340006 for ; Fri, 3 Nov 2023 01:07:11 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=dysVEdQX; dmarc=none; spf=pass (imf17.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.179 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698973632; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qCr2uTXG4daALtyBbHvIpmUdOFTLryhu1H0hnelXOT0=; b=ZgwWjSraEMvtBiTRXmji3PF3dlfvD/P2CPPONvuEHKbTjdIuAZhN0pjqQuC0hAD4/b6xeF Ry/5KPM+DByPW+SdPN2tRofShmTXdq4s76J3vQcOkWjiEsLFpvz5Zw4LpRRj6iDu6r2T9k XoCc04FkHvAJ3UOLl+OrS8oTPqiIMZ0= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=dysVEdQX; dmarc=none; spf=pass (imf17.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.179 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698973632; a=rsa-sha256; cv=none; b=jSr/MSq4+VHIvgk+iV/+4kkAm5XBC1nWOtQYc6Ywx7/HaIFIxBhJZOl2f+idozRn8YHFDQ DW/ZSP5/gIsztkVrX3mK/2gLlT+R748T9edKyAMv4zJqILEiKWMR+JX+dVW6mb0jTYL8p+ iVOR4hILPhJYobFzcaKCxZdw+kYTOXQ= Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-4196ae80fc3so8665781cf.0 for ; Thu, 02 Nov 2023 18:07:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1698973631; x=1699578431; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qCr2uTXG4daALtyBbHvIpmUdOFTLryhu1H0hnelXOT0=; b=dysVEdQXjavBzwJZcZW0HJVdo99vvls1nHXbXqsy37Hwh3hdkHxqxJnnOfFbLoYt1s ySOncc1zCkJSfk6KtFXe6jyUvpxOP7zPv0d46/AOj2SeZfdi/LKP9ThZka86/JiVZFKm MCHl3OEZU09It3YjpxVQ7HjGlznXMuFPCQgX7cJlcVAYEHZuCuzfdyV5EatBZ9MApO3W 7o3z0Fgn4yWwLjiWBP5Lq07vNtCxo07OtL8Z/Ih3+UeqeMgGWcR2rAw5GQGWM2KcTfta zJ+t/BAo2IvEtKaGamGEBsAvtOJXZ9vzNgo107WGTonJpr5iJH11hh0dLdG/q6GPkEeu IWwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698973631; x=1699578431; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qCr2uTXG4daALtyBbHvIpmUdOFTLryhu1H0hnelXOT0=; b=ab6yUnl3qGCrP8l2bOMl/M9tzM6sOWQvx2XawxmWTdxBNOdb0ga0l5E1pq+daP2UUH g3yx7t3IfREJdDgaVmCiM/oPHwy//+OlHkINBVa+xzjsPkwc8K0i7nFKpevQXM5Fiyef aqL8EpSWIE7QisHWnb/7/aMFw6fgEaJcq0/z/5GAkK4rQqN4efBhqVT59EBCS32fCtCq 4/SRqxlbp/ITdsTgiiNJHGmttU0S7ooHjC+N7t97O810hSneS0WQbXESnFO3P+xN+sZT k3e5b1jXa7M01JjbWGkL/Mwett8sNdWZJPK12SLJM35JIgpxMgyOmrf1dj1EpSyMYlnn MtJQ== X-Gm-Message-State: AOJu0Yx4EQERJbGB7GEhWWzru2vHhGnz89iGHZvpmSAXyJ21xJRmiuPc 9UJGrxj2/qzmDgLZr+N2iapYrVrMnW0WeVUlbNM4uQ== X-Google-Smtp-Source: AGHT+IEqYrzRZ/GFNvFXxoK50vrQQuq1A8gYWJ3fw0CyeAan/l0SmscOwqpyx5yApNc02+deo9lEbx2+K5Z9OdVf+gg= X-Received: by 2002:ac8:5a8f:0:b0:41e:37cf:8661 with SMTP id c15-20020ac85a8f000000b0041e37cf8661mr23002590qtc.12.1698973631023; Thu, 02 Nov 2023 18:07:11 -0700 (PDT) MIME-Version: 1.0 References: <20231101230816.1459373-1-souravpanda@google.com> <20231101230816.1459373-2-souravpanda@google.com> <1e99ff39-b1cf-48b8-8b6d-ba5391e00db5@redhat.com> <025ef794-91a9-4f0c-9eb6-b0a4856fa10a@redhat.com> <99113dee-6d4d-4494-9eda-62b1faafdbae@redhat.com> In-Reply-To: From: Pasha Tatashin Date: Thu, 2 Nov 2023 21:06:33 -0400 Message-ID: Subject: Re: [PATCH v5 1/1] mm: report per-page metadata information To: Wei Xu Cc: David Hildenbrand , Sourav Panda , corbet@lwn.net, gregkh@linuxfoundation.org, rafael@kernel.org, akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, rppt@kernel.org, rdunlap@infradead.org, chenlinxuan@uniontech.com, yang.yang29@zte.com.cn, tomas.mudrunka@gmail.com, bhelgaas@google.com, ivan@cloudflare.com, yosryahmed@google.com, hannes@cmpxchg.org, shakeelb@google.com, kirill.shutemov@linux.intel.com, wangkefeng.wang@huawei.com, adobriyan@gmail.com, vbabka@suse.cz, Liam.Howlett@oracle.com, surenb@google.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, Greg Thelen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0EAC340006 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: piqb7ndwh6dahpw4udyf4xhg39ethwet X-HE-Tag: 1698973631-747267 X-HE-Meta: U2FsdGVkX19J3Mt0RNSQC29eM5QfpgXusiICYJdNw1p8lkdElgxeLFbheAr2UIsa3KOYMRJy9yBQAJNg6K0xTk++rPqXv7MvBS/pPacWzVeZnTFE5/ovI9HsbkX54/4A4NldPj6H+UOC5hMK3vTEFCYYewJYY552B/Cc7S6Z3O0RiOlD01jKGNHcbVKw3F7b2dwY+ULks0Pdmb7OFLFr7jiW7IIuLIWttEy1MX/FGfDpHSYb6TdKjV5e/Cyk2SXlLz9ozrdWnqXXl+urLTFtmz0aPQ4+0N4Y2b36IWTVYXEEA1+d38AHw7aZInE+/W2zvISiZ4rcb3EtENU+lPnn5hdowc48KBFfUsSx6Lo+izFMx/6/cWR5dNYKwz/VOYmMsbPCTK6AXSCM1/u4Ffj9b4JGHTBRgzQuq1dHTJgNzYhqKjWOsxUTjlx1rD4Egyrm1wruZd1l5ikjMsz9iFMs6eeiB4LFH/d0tbs7Pr7XlZFfHhrxDvTd/3aW9ruYYdFbNvTN/IDqQK5aUTMqkHlENcjX7i6qPZpc+hsDCg582PBDIt0n/VHH23Ipo9PacAR2iXuhXNNgz9AO8Xt98A3yvhXjJgYuZIMTMz94zCc36Iu7XBuKa64GdDubJRYJJJfYLhHnWwSyBqxVrWZm1reILkV/urtiuZUTKkHFzkRJX6oGHW+v5s7OEtt7q9kRKGGQVezyJ9MWc+jhtyTqnIhDk0jdp1H0fo/EAk88Cried6Kmc8xGAhTPWTqMWc2itJ9jsjxxocWsadMSyipI7Q9/iydszSOLBTTYRBRAdB9xnHVu0wEewHs0EhfzzgRtiCEh9r+5Je9m6XhE6vGqHoNq+Oi3aHcKp/h1x3Vel/TdH1KF9LQpYfq+EhGHUqcK+xGotWimPU00oGzTM4eRydvRKGY0nNj1OFs7RBuMUpzhz3J9GYEk7NvdZCvCshziuALsExBweZncxtbFg7w2Whk 2jkjJsXR u7wJSKnXmOkGOX8ikAdBACikzdUTGbOGt3Sxf0n+qmvXHXbMJWi02eSnqtlUUWA5JTZRgjIghAJ3VcDd0bUJ3G+lfg6NXOXyHo+MMFz00TymvMZLQgOj0oz0kBBMu5aKR38gjqME+HP4DgQo8a4NB9dWZFeAcLzMbiNQWxq8oV1osY5aWykFFLlYXiYGMLlctLvbwqVdhjbpyDywysEIIC3OJeIV3ObQt8ACX9e8hLUdrkSXMrS2QZEN89Kkbap++CYcKZZyrraxIw0PhldPMxN1GHguKkqBxEK4gYCxXMRW0a1rT7iliBxTimeFJALuEuMTpxwLILl19770SeeDkWDQ/Vw4TPepx0rAjtPhqgjHcATI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 2, 2023 at 4:22=E2=80=AFPM Wei Xu wrote: > > On Thu, Nov 2, 2023 at 11:34=E2=80=AFAM Pasha Tatashin > wrote: > > > > > > > I could have sworn that I pointed that out in a previous version = and > > > > > requested to document that special case in the patch description.= :) > > > > > > > > Sounds, good we will document that parts of per-page may not be par= t > > > > of MemTotal. > > > > > > But this still doesn't answer how we can use the new PageMetadata > > > field to help break down the runtime kernel overhead within MemUsed > > > (MemTotal - MemFree). > > > > I am not sure it matters to the end users: they look at PageMetadata > > with or without Page Owner, page_table_check, HugeTLB and it shows > > exactly how much per-page overhead changed. Where the kernel allocated > > that memory is not that important to the end user as long as that > > memory became available to them. > > > > In addition, it is still possible to estimate the actual memblock part > > of Per-page metadata by looking at /proc/zoneinfo: > > > > Memblock reserved per-page metadata: "present_pages - managed_pages" > > This assumes that all reserved memblocks are per-page metadata. As I Right after boot, when all Per-page metadata is still from memblocks, we could determine what part of the zone reserved memory is not per-page, and use it later in our calculations. > mentioned earlier, it is not a robust approach. > > If there is something big that we will allocate in that range, we > > should probably also export it in some form. > > > > If this field does not fit in /proc/meminfo due to not fully being > > part of MemTotal, we could just keep it under nodeN/, as a separate > > file, as suggested by Greg. > > > > However, I think it is useful enough to have an easy system wide view > > for Per-page metadata. > > It is fine to have this as a separate, informational sysfs file under > nodeN/, outside of meminfo. I just don't think as in the current > implementation (where PageMetadata is a mixture of buddy and memblock > allocations), it can help with the use case that motivates this > change, i.e. to improve the breakdown of the kernel overhead. > > > > > > are allocated), so what would be the best way to export page me= tadata > > > > > > without redefining MemTotal? Keep the new field in /proc/meminf= o but > > > > > > be ok that it is not part of MemTotal or do two counters? If we= do two > > > > > > counters, we will still need to keep one that is a buddy alloca= tor in > > > > > > /proc/meminfo and the other one somewhere outside? > > > > > > > > > > > I think the simplest thing to do now is to only report the buddy > > > allocations of per-page metadata in meminfo. The meaning of the new > > > > This will cause PageMetadata to be 0 on 99% of the systems, and > > essentially become useless to the vast majority of users. > > I don't think it is a major issue. There are other fields (e.g. Zswap) > in meminfo that remain 0 when the feature is not used. Since we are going to use two independent interfaces /proc/meminfo/PageMetadata and nodeN/page_metadata (in a separate file as requested by Greg) How about if in /proc/meminfo we provide only the buddy allocator part, and in nodeN/page_metadata we provide the total per-page overhead in the given node that include memblock reserves, and buddy allocator memory? Pasha