Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>,
	Michal Hocko <mhocko@suse.com>
Cc: linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Cgroups <cgroups@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH 0/2] mm: skip memcg for certain address space
Date: Thu, 18 Jul 2024 17:22:41 +0930	[thread overview]
Message-ID: <3cc3e652-e058-4995-8347-337ae605ebab@suse.com> (raw)
In-Reply-To: <9572fc2b-12b0-41a3-82dc-bb273bfdd51d@kernel.org>



在 2024/7/18 16:47, Vlastimil Babka (SUSE) 写道:
> On 7/18/24 12:38 AM, Qu Wenruo wrote:
[...]
>> Another question is, I only see this hang with larger folio (order 2 vs
>> the old order 0) when adding to the same address space.
>>
>> Does the folio order has anything related to the problem or just a
>> higher order makes it more possible?
> 
> I didn't spot anything in the memcg charge path that would depend on the
> order directly, hm. Also what kernel version was showing these soft lockups?

The previous rc kernel. IIRC it's v6.10-rc6.

But that needs extra btrfs patches, or btrfs are still only doing the 
order-0 allocation, then add the order-0 folio into the filemap.

The extra patch just direct btrfs to allocate an order 2 folio (matching 
the default 16K nodesize), then attach the folio to the metadata filemap.

With extra coding handling corner cases like different folio sizes etc.

> 
>> And finally, even without the hang problem, does it make any sense to
>> skip all the possible memcg charge completely, either to reduce latency
>> or just to reduce GFP_NOFAIL usage, for those user inaccessible inodes?
> 
> Is it common to even use the filemap code for such metadata that can't be
> really mapped to userspace?

At least XFS/EXT4 doesn't use filemap to handle their metadata. One of 
the reason is, btrfs has pretty large metadata structure.
Not only for the regular filesystem things, but also data checksum.

Even using the default CRC32C algo, it's 4 bytes per 4K data.
Thus things can go crazy pretty easily, and that's the reason why btrfs 
is still sticking to the filemap solution.

> How does it even interact with reclaim, do they
> become part of the page cache and are scanned by reclaim together with data
> that is mapped?

Yes, it's handled just like all other filemaps, it's also using page 
cache, and all the lru/scanning things.

The major difference is, we only implement a small subset of the address 
operations:

- write
- release
- invalidate
- migrate
- dirty (debug only, otherwise falls back to filemap_dirty_folio())

Note there is no read operations, as it's btrfs itself triggering the 
metadata read, thus there is no read/readahead.
Thus we're in the full control of the page cache, e.g. determine the 
folio size to be added into the filemap.

The filemap infrastructure provides 2 good functionalities:

- (Page) Cache
   So that we can easily determine if we really need to read from the
   disk, and this can save us a lot of random IOs.

- Reclaiming

And of course the page cache of the metadata inode won't be 
cloned/shared to any user accessible inode.

> How are the lru decisions handled if there are no references
> for PTE access bits. Or can they be even reclaimed, or because there may
> e.g. other open inodes pinning this metadata, the reclaim is impossible?

If I understand it correctly, we have implemented release_folio() 
callback, which does the btrfs metadata checks to determine if we can 
release the current folio, and avoid releasing folios that's still under 
IO etc.

> 
> (sorry if the questions seem noob, I'm not that much familiar with the page
> cache side of mm)

No worry at all, I'm also a newbie on the whole mm part.

Thanks,
Qu

> 
>> Thanks,
>> Qu
> 

  parent reply	other threads:[~2024-07-18  7:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-10  1:07 [PATCH 0/2] mm: skip memcg for certain address space Qu Wenruo
2024-07-10  1:07 ` [PATCH 1/2] mm: make lru_gen_eviction() to handle folios without memcg info Qu Wenruo
2024-07-10  1:07 ` [PATCH 2/2] mm: allow certain address space to be not accounted by memcg Qu Wenruo
2024-07-17  7:42 ` [PATCH 0/2] mm: skip memcg for certain address space Qu Wenruo
2024-07-17 15:55 ` Vlastimil Babka (SUSE)
2024-07-17 16:14   ` Michal Hocko
2024-07-17 22:38     ` Qu Wenruo
2024-07-18  7:17       ` Vlastimil Babka (SUSE)
2024-07-18  7:25         ` Michal Hocko
2024-07-18  7:57           ` Qu Wenruo
2024-07-18  8:09             ` Michal Hocko
2024-07-18  8:10               ` Michal Hocko
2024-07-18  8:52               ` Qu Wenruo
2024-07-18  9:25                 ` Michal Hocko
2024-07-18  7:52         ` Qu Wenruo [this message]
2024-07-18  8:28           ` Vlastimil Babka (SUSE)
2024-07-18  8:50             ` Qu Wenruo
2024-07-18  9:19               ` Vlastimil Babka (SUSE)
2024-07-25  9:00   ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3cc3e652-e058-4995-8347-337ae605ebab@suse.com \
    --to=wqu@suse.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox