linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>,
	Michal Hocko <mhocko@suse.com>
Cc: linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Cgroups <cgroups@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH 0/2] mm: skip memcg for certain address space
Date: Thu, 18 Jul 2024 18:20:47 +0930	[thread overview]
Message-ID: <2b48a095-97e6-43bc-9f7c-13dd31ce00b8@suse.com> (raw)
In-Reply-To: <ea6cfaf6-bdb8-48a4-bf59-9f54f36b112e@kernel.org>



在 2024/7/18 17:58, Vlastimil Babka (SUSE) 写道:
> On 7/18/24 9:52 AM, Qu Wenruo wrote:
>>
>>
>> 在 2024/7/18 16:47, Vlastimil Babka (SUSE) 写道:
>>> On 7/18/24 12:38 AM, Qu Wenruo wrote:
>> [...]
>>>> Another question is, I only see this hang with larger folio (order 2 vs
>>>> the old order 0) when adding to the same address space.
>>>>
>>>> Does the folio order has anything related to the problem or just a
>>>> higher order makes it more possible?
>>>
>>> I didn't spot anything in the memcg charge path that would depend on the
>>> order directly, hm. Also what kernel version was showing these soft lockups?
>>
>> The previous rc kernel. IIRC it's v6.10-rc6.
>>
>> But that needs extra btrfs patches, or btrfs are still only doing the
>> order-0 allocation, then add the order-0 folio into the filemap.
>>
>> The extra patch just direct btrfs to allocate an order 2 folio (matching
>> the default 16K nodesize), then attach the folio to the metadata filemap.
>>
>> With extra coding handling corner cases like different folio sizes etc.
> 
> Hm right, but the same code is triggered for high-order folios (at least for
> user mappable page cache) today by some filesystems AFAIK, so we should be
> seeing such lockups already? btrfs case might be special that it's for the
> internal node as you explain, but that makes no difference for
> filemap_add_folio(), right? Or is it the only user with GFP_NOFS? Also is
> that passed as gfp directly or are there some extra scoped gfp resctrictions
> involved? (memalloc_..._save()).

I'm not sure about other fses, but for that hang case, it's very 
metadata heavy, and ALL folios for that btree inode filemap is in order 
2, since we're always allocating the order folios using GFP_NOFAIL, and 
attaching that folio into the filemap using GFP_NOFAIL too.

Not sure if other fses can have such situation.

[...]
>> If I understand it correctly, we have implemented release_folio()
>> callback, which does the btrfs metadata checks to determine if we can
>> release the current folio, and avoid releasing folios that's still under
>> IO etc.
> 
> I see, thanks. Sounds like there might be potentially some suboptimal
> handling in that the folio will appear inactive because there's no
> references that folio_check_references() can detect, unless there's some
> folio_mark_accessed() calls involved (I see some FGP_ACCESSED in btrfs so
> maybe that's fine enough) so reclaim could consider it often, only to be
> stopped by release_folio failing.

For the page accessed part, btrfs handles it by 
mark_extent_buffer_accessed() call, and it's called every time we try to 
grab an extent buffer structure (the structure used to represent a 
metadata block inside btrfs).

So the accessed flag part should be fine I guess?

Thanks,
Qu
> 
>>>
>>> (sorry if the questions seem noob, I'm not that much familiar with the page
>>> cache side of mm)
>>
>> No worry at all, I'm also a newbie on the whole mm part.
>>
>> Thanks,
>> Qu
>>
>>>
>>>> Thanks,
>>>> Qu
>>>
> 

  reply	other threads:[~2024-07-18  8:50 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-10  1:07 [PATCH 0/2] mm: skip memcg for certain address space Qu Wenruo
2024-07-10  1:07 ` [PATCH 1/2] mm: make lru_gen_eviction() to handle folios without memcg info Qu Wenruo
2024-07-10  1:07 ` [PATCH 2/2] mm: allow certain address space to be not accounted by memcg Qu Wenruo
2024-07-17  7:42 ` [PATCH 0/2] mm: skip memcg for certain address space Qu Wenruo
2024-07-17 15:55 ` Vlastimil Babka (SUSE)
2024-07-17 16:14   ` Michal Hocko
2024-07-17 22:38     ` Qu Wenruo
2024-07-18  7:17       ` Vlastimil Babka (SUSE)
2024-07-18  7:25         ` Michal Hocko
2024-07-18  7:57           ` Qu Wenruo
2024-07-18  8:09             ` Michal Hocko
2024-07-18  8:10               ` Michal Hocko
2024-07-18  8:52               ` Qu Wenruo
2024-07-18  9:25                 ` Michal Hocko
2024-07-18  7:52         ` Qu Wenruo
2024-07-18  8:28           ` Vlastimil Babka (SUSE)
2024-07-18  8:50             ` Qu Wenruo [this message]
2024-07-18  9:19               ` Vlastimil Babka (SUSE)
2024-07-25  9:00   ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b48a095-97e6-43bc-9f7c-13dd31ce00b8@suse.com \
    --to=wqu@suse.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).