linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Lance Yang <ioworker0@gmail.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Jonathan Corbet <corbet@lwn.net>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Barry Song <baohua@kernel.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v1 2/2] mm: mTHP stats for pagecache folio allocations
Date: Sat, 13 Jul 2024 03:08:50 +0200	[thread overview]
Message-ID: <756c359e-bb8f-481e-a33f-163c729afa31@redhat.com> (raw)
In-Reply-To: <CAK1f24nCDZM8aa9z_ZtgLbdj695JJri01q2HJUJb9pJt2uqy=w@mail.gmail.com>

On 12.07.24 14:22, Lance Yang wrote:
> On Fri, Jul 12, 2024 at 11:00 AM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>>
>>
>> On 2024/7/11 15:29, Ryan Roberts wrote:
>>> Expose 3 new mTHP stats for file (pagecache) folio allocations:
>>>
>>>     /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
>>>     /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
>>>     /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
>>>
>>> This will provide some insight on the sizes of large folios being
>>> allocated for file-backed memory, and how often allocation is failing.
>>>
>>> All non-order-0 (and most order-0) folio allocations are currently done
>>> through filemap_alloc_folio(), and folios are charged in a subsequent
>>> call to filemap_add_folio(). So count file_fallback when allocation
>>> fails in filemap_alloc_folio() and count file_alloc or
>>> file_fallback_charge in filemap_add_folio(), based on whether charging
>>> succeeded or not. There are some users of filemap_add_folio() that
>>> allocate their own order-0 folio by other means, so we would not count
>>> an allocation failure in this case, but we also don't care about order-0
>>> allocations. This approach feels like it should be good enough and
>>> doesn't require any (impractically large) refactoring.
>>>
>>> The existing mTHP stats interface is reused to provide consistency to
>>> users. And because we are reusing the same interface, we can reuse the
>>> same infrastructure on the kernel side. The one small wrinkle is that
>>> the set of folio sizes supported by the pagecache are not identical to
>>> those supported by anon and shmem; pagecache supports order-1, unlike
>>> anon and shmem, and the max pagecache order may be less than PMD-size
>>> (see arm64 with 64K base pages), again unlike anon and shmem. So we now
>>> create a hugepages-*kB directory for the union of the sizes supported by
>>> all 3 memory types and populate it with the relevant stats and controls.
>>
>> Personally, I like the idea that can help analyze the allocation of
>> large folios for the page cache.
>>
>> However, I have a slight concern about the consistency of the interface.
>>
>> For 64K, the fields layout:
>> ├── hugepages-64kB
>> │   ├── enabled
>> │   ├── shmem_enabled
>> │   └── stats
>> │       ├── anon_fault_alloc
>> │       ├── anon_fault_fallback
>> │       ├── anon_fault_fallback_charge
>> │       ├── file_alloc
>> │       ├── file_fallback
>> │       ├── file_fallback_charge
>> │       ├── shmem_alloc
>> │       ├── shmem_fallback
>> │       ├── shmem_fallback_charge
>> │       ├── split
>> │       ├── split_deferred
>> │       ├── split_failed
>> │       ├── swpout
>> │       └── swpout_fallback
>>
>> But for 8K (for pagecache), you removed some fields (of course, I
>> understand why they are not supported).
>>
>> ├── hugepages-8kB
>> │   └── stats
>> │       ├── file_alloc
>> │       ├── file_fallback
>> │       └── file_fallback_charge
>>
>> This might not be user-friendly for some user-space parsing tools, as
>> they lack certain fields for the same pattern interfaces. Of course,
>> this might not be an issue if we have clear documentation describing the
>> differences here:)
>>
>> Another possible approach is to maintain the same field layout to keep
>> consistent, but prohibit writing to the fields that are not supported by
>> the pagecache, and any stats read from them would be 0.
> 
> I agree that maintaining a uniform field layout, especially at the stats
> level, might be necessary ;)
> 
> Keeping a consistent interface could future-proof the design. It allows
> for the possibility that features not currently supported for 8kB pages
> might be enabled in the future.

I'll just note that, with shmem/file effectively being disabled for 
order > 11, we'll also have entries there that are effectively unused.

Good question how we want to deal with that (stats are easy, but what 
about when we enable something? Maybe we should document that "enabled" 
is only effective when supported).

Hmmmmm

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-07-13  1:09 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-11  7:29 [PATCH v1 0/2] mTHP allocation stats for file-backed memory Ryan Roberts
2024-07-11  7:29 ` [PATCH v1 1/2] mm: Cleanup count_mthp_stat() definition Ryan Roberts
2024-07-11  8:20   ` Barry Song
2024-07-12  2:31   ` Baolin Wang
2024-07-12 11:57   ` Lance Yang
2024-07-11  7:29 ` [PATCH v1 2/2] mm: mTHP stats for pagecache folio allocations Ryan Roberts
2024-07-12  3:00   ` Baolin Wang
2024-07-12 12:22     ` Lance Yang
2024-07-13  1:08       ` David Hildenbrand [this message]
2024-07-13 10:45         ` Ryan Roberts
2024-07-16  8:31           ` Ryan Roberts
2024-07-16 10:19             ` David Hildenbrand
2024-07-16 11:14               ` Ryan Roberts
2024-07-17  8:02                 ` David Hildenbrand
2024-07-17  8:29                   ` Ryan Roberts
2024-07-17  8:44                     ` David Hildenbrand
2024-07-17  9:50                       ` Ryan Roberts
2024-07-17 10:03                         ` David Hildenbrand
2024-07-17 10:18                           ` Ryan Roberts
2024-07-17 10:25                             ` David Hildenbrand
2024-07-17 10:48                               ` Ryan Roberts
2024-07-13 11:00     ` Ryan Roberts
2024-07-13 12:54       ` Baolin Wang
2024-07-14  9:05         ` Ryan Roberts
2024-07-22  3:52           ` Baolin Wang
2024-07-22  7:36             ` Ryan Roberts
2024-07-12 22:44   ` kernel test robot
2024-07-15 13:55     ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=756c359e-bb8f-481e-a33f-163c729afa31@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=corbet@lwn.net \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).