From: David Hildenbrand <david@redhat.com>
To: Jeff Layton <jlayton@kernel.org>, Shakeel Butt <shakeel.butt@linux.dev>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
Joanne Koong <joannelkoong@gmail.com>,
Bernd Schubert <bernd.schubert@fastmail.fm>,
Zi Yan <ziy@nvidia.com>,
linux-fsdevel@vger.kernel.org, jefflexu@linux.alibaba.com,
josef@toxicpanda.com, linux-mm@kvack.org, kernel-team@meta.com,
Matthew Wilcox <willy@infradead.org>,
Oscar Salvador <osalvador@suse.de>,
Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings
Date: Fri, 10 Jan 2025 22:13:17 +0100 [thread overview]
Message-ID: <ccefea7b-88a5-4472-94cd-1e320bf90b44@redhat.com> (raw)
In-Reply-To: <54ebdef4205781d3351e4a38e5551046482dbba0.camel@kernel.org>
On 10.01.25 21:28, Jeff Layton wrote:
> On Thu, 2025-01-09 at 12:22 +0100, David Hildenbrand wrote:
>> On 07.01.25 19:07, Shakeel Butt wrote:
>>> On Tue, Jan 07, 2025 at 09:34:49AM +0100, David Hildenbrand wrote:
>>>> On 06.01.25 19:17, Shakeel Butt wrote:
>>>>> On Mon, Jan 06, 2025 at 11:19:42AM +0100, Miklos Szeredi wrote:
>>>>>> On Fri, 3 Jan 2025 at 21:31, David Hildenbrand <david@redhat.com> wrote:
>>>>>>> In any case, having movable pages be turned unmovable due to persistent
>>>>>>> writaback is something that must be fixed, not worked around. Likely a
>>>>>>> good topic for LSF/MM.
>>>>>>
>>>>>> Yes, this seems a good cross fs-mm topic.
>>>>>>
>>>>>> So the issue discussed here is that movable pages used for fuse
>>>>>> page-cache cause a problems when memory needs to be compacted. The
>>>>>> problem is either that
>>>>>>
>>>>>> - the page is skipped, leaving the physical memory block unmovable
>>>>>>
>>>>>> - the compaction is blocked for an unbounded time
>>>>>>
>>>>>> While the new AS_WRITEBACK_INDETERMINATE could potentially make things
>>>>>> worse, the same thing happens on readahead, since the new page can be
>>>>>> locked for an indeterminate amount of time, which can also block
>>>>>> compaction, right?
>>>>
>>>> Yes, as memory hotplug + virtio-mem maintainer my bigger concern is these
>>>> pages residing in ZONE_MOVABLE / MIGRATE_CMA areas where there *must not be
>>>> unmovable pages ever*. Not triggered by an untrusted source, not triggered
>>>> by an trusted source.
>>>>
>>>> It's a violation of core-mm principles.
>>>
>>> The "must not be unmovable pages ever" is a very strong statement and we
>>> are violating it today and will keep violating it in future. Any
>>> page/folio under lock or writeback or have reference taken or have been
>>> isolated from their LRU is unmovable (most of the time for small period
>>> of time).
>>
>> ^ this: "small period of time" is what I meant.
>>
>> Most of these things are known to not be problematic: retrying a couple
>> of times makes it work, that's why migration keeps retrying.
>>
>> Again, as an example, we allow short-term O_DIRECT but disallow
>> long-term page pinning. I think there were concerns at some point if
>> O_DIRECT might also be problematic (I/O might take a while), but so far
>> it was not a problem in practice that would make CMA allocations easily
>> fail.
>>
>> vmsplice() is a known problem, because it behaves like O_DIRECT but
>> actually triggers long-term pinning; IIRC David Howells has this on his
>> todo list to fix. [I recall that seccomp disallows vmsplice by default
>> right now]
>>
>> These operations are being done all over the place in kernel.
>>> Miklos gave an example of readahead.
>>
>> I assume you mean "unmovable for a short time", correct, or can you
>> point me at that specific example; I think I missed that.
>>
>>> The per-CPU LRU caches are another
>>> case where folios can get stuck for long period of time.
>>
>> Which is why memory offlining disables the lru cache. See
>> lru_cache_disable(). Other users that care about that drain the LRU on
>> all cpus.
>>
>>> Reclaim and
>>> compaction can isolate a lot of folios that they need to have
>>> too_many_isolated() checks. So, "must not be unmovable pages ever" is
>>> impractical.
>>
>> "must only be short-term unmovable", better?
>>
>
> Still a little ambiguous.
>
> How short is "short-term"? Are we talking milliseconds or minutes?
Usually a couple of seconds, max. For memory offlining, slightly longer
times are acceptable; other things (in particular compaction or CMA
allocations) will give up much faster.
>
> Imposing a hard timeout on writeback requests to unprivileged FUSE
> servers might give us a better guarantee of forward-progress, but it
> would probably have to be on the order of at least a minute or so to be
> workable.
Yes, and that might already be a bit too much, especially if stuck on
waiting for folio writeback ... so ideally we could find a way to
migrate these folios that are under writeback and it's not your ordinary
disk driver that responds rather quickly.
Right now we do it via these temp pages, and I can see how that's
undesirable.
For NFS etc. we probably never ran into this, because it's all used in
fairly well managed environments and, well, I assume NFS easily outdates
CMA and ZONE_MOVABLE :)
> >>>
>>> The point is that, yes we should aim to improve things but in iterations
>>> and "must not be unmovable pages ever" is not something we can achieve
>>> in one step.
>>
>> I agree with the "improve things in iterations", but as
>> AS_WRITEBACK_INDETERMINATE has the FOLL_LONGTERM smell to it, I think we
>> are making things worse.
>>
>> And as this discussion has been going on for too long, to summarize my
>> point: there exist conditions where pages are short-term unmovable, and
>> possibly some to be fixed that turn pages long-term unmovable (e.g.,
>> vmsplice); that does not mean that we can freely add new conditions that
>> turn movable pages unmovable long-term or even forever.
>>
>> Again, this might be a good LSF/MM topic. If I would have the capacity I
>> would suggest a topic around which things are know to cause pages to be
>> short-term or long-term unmovable/unsplittable, and which can be
>> handled, which not. Maybe I'll find the time to propose that as a topic.
>>
>
>
> This does sound like great LSF/MM fodder! I predict that this session
> will run long! ;)
Heh, fully agreed! :)
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-01-10 21:13 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-22 23:23 [PATCH v6 0/5] fuse: remove temp page copies in writeback Joanne Koong
2024-11-22 23:23 ` [PATCH v6 1/5] mm: add AS_WRITEBACK_INDETERMINATE mapping flag Joanne Koong
2024-11-22 23:23 ` [PATCH v6 2/5] mm: skip reclaiming folios in legacy memcg writeback indeterminate contexts Joanne Koong
2024-11-22 23:23 ` [PATCH v6 3/5] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_INDETERMINATE mappings Joanne Koong
2024-11-22 23:23 ` [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with " Joanne Koong
2024-12-19 13:05 ` David Hildenbrand
2024-12-19 14:19 ` Zi Yan
2024-12-19 15:08 ` Zi Yan
2024-12-19 15:39 ` David Hildenbrand
2024-12-19 15:47 ` Zi Yan
2024-12-19 15:50 ` David Hildenbrand
2024-12-19 15:43 ` Shakeel Butt
2024-12-19 15:47 ` David Hildenbrand
2024-12-19 15:53 ` Shakeel Butt
2024-12-19 15:55 ` Zi Yan
2024-12-19 15:56 ` Bernd Schubert
2024-12-19 16:00 ` Zi Yan
2024-12-19 16:02 ` Zi Yan
2024-12-19 16:09 ` Bernd Schubert
2024-12-19 16:14 ` Zi Yan
2024-12-19 16:26 ` Shakeel Butt
2024-12-19 16:31 ` David Hildenbrand
2024-12-19 16:53 ` Shakeel Butt
2024-12-19 16:22 ` Shakeel Butt
2024-12-19 16:29 ` David Hildenbrand
2024-12-19 16:40 ` Shakeel Butt
2024-12-19 16:41 ` David Hildenbrand
2024-12-19 17:14 ` Shakeel Butt
2024-12-19 17:26 ` David Hildenbrand
2024-12-19 17:30 ` Bernd Schubert
2024-12-19 17:37 ` Shakeel Butt
2024-12-19 17:40 ` Bernd Schubert
2024-12-19 17:44 ` Joanne Koong
2024-12-19 17:54 ` Shakeel Butt
2024-12-20 11:44 ` David Hildenbrand
2024-12-20 12:15 ` Bernd Schubert
2024-12-20 14:49 ` David Hildenbrand
2024-12-20 15:26 ` Bernd Schubert
2024-12-20 18:01 ` Shakeel Butt
2024-12-21 2:28 ` Jingbo Xu
2024-12-21 16:23 ` David Hildenbrand
2024-12-22 2:47 ` Jingbo Xu
2024-12-24 11:32 ` David Hildenbrand
2024-12-21 16:18 ` David Hildenbrand
2024-12-23 22:14 ` Shakeel Butt
2024-12-24 12:37 ` David Hildenbrand
2024-12-26 15:11 ` Zi Yan
2024-12-26 20:13 ` Shakeel Butt
2024-12-26 22:02 ` Bernd Schubert
2024-12-27 20:08 ` Joanne Koong
2024-12-27 20:32 ` Bernd Schubert
2024-12-30 17:52 ` Joanne Koong
2024-12-30 10:16 ` David Hildenbrand
2024-12-30 18:38 ` Joanne Koong
2024-12-30 19:52 ` David Hildenbrand
2024-12-30 20:11 ` Shakeel Butt
2025-01-02 18:54 ` Joanne Koong
2025-01-03 20:31 ` David Hildenbrand
2025-01-06 10:19 ` Miklos Szeredi
2025-01-06 18:17 ` Shakeel Butt
2025-01-07 8:34 ` David Hildenbrand
2025-01-07 18:07 ` Shakeel Butt
2025-01-09 11:22 ` David Hildenbrand
2025-01-10 20:28 ` Jeff Layton
2025-01-10 21:13 ` David Hildenbrand [this message]
2025-01-10 22:00 ` Shakeel Butt
2025-01-13 15:27 ` David Hildenbrand
2025-01-13 21:44 ` Jeff Layton
2025-01-14 8:38 ` Miklos Szeredi
2025-01-14 9:40 ` Miklos Szeredi
2025-01-14 9:55 ` Bernd Schubert
2025-01-14 10:07 ` Miklos Szeredi
2025-01-14 18:07 ` Joanne Koong
2025-01-14 18:58 ` Miklos Szeredi
2025-01-14 19:12 ` Joanne Koong
2025-01-14 20:00 ` Miklos Szeredi
2025-01-14 20:29 ` Jeff Layton
2025-01-14 21:40 ` Bernd Schubert
2025-01-23 16:06 ` Pavel Begunkov
2025-01-14 20:51 ` Joanne Koong
2025-01-24 12:25 ` David Hildenbrand
2025-01-14 15:49 ` Jeff Layton
2025-01-24 12:29 ` David Hildenbrand
2025-01-28 10:16 ` Miklos Szeredi
2025-01-14 15:44 ` Jeff Layton
2025-01-14 18:58 ` Joanne Koong
2025-01-10 23:11 ` Jeff Layton
2025-01-10 20:16 ` Jeff Layton
2025-01-10 20:20 ` David Hildenbrand
2025-01-10 20:43 ` Jeff Layton
2025-01-10 21:00 ` David Hildenbrand
2025-01-10 21:07 ` Jeff Layton
2025-01-10 21:21 ` David Hildenbrand
2025-01-07 16:15 ` Miklos Szeredi
2025-01-08 1:40 ` Jingbo Xu
2024-12-30 20:04 ` Shakeel Butt
2025-01-02 19:59 ` Joanne Koong
2025-01-02 20:26 ` Zi Yan
2024-12-20 21:01 ` Joanne Koong
2024-12-21 16:25 ` David Hildenbrand
2024-12-21 21:59 ` Bernd Schubert
2024-12-23 19:00 ` Joanne Koong
2024-12-26 22:44 ` Bernd Schubert
2024-12-27 18:25 ` Joanne Koong
2024-12-19 17:55 ` Joanne Koong
2024-12-19 18:04 ` Bernd Schubert
2024-12-19 18:11 ` Shakeel Butt
2024-12-20 7:55 ` Jingbo Xu
2025-04-02 21:34 ` Joanne Koong
2025-04-03 3:31 ` Jingbo Xu
2025-04-03 9:18 ` David Hildenbrand
2025-04-03 9:25 ` Bernd Schubert
2025-04-03 9:35 ` Christian Brauner
2025-04-03 19:09 ` Joanne Koong
2025-04-03 20:44 ` David Hildenbrand
2025-04-03 22:04 ` Joanne Koong
2024-11-22 23:23 ` [PATCH v6 5/5] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2024-11-25 9:46 ` Jingbo Xu
2024-12-12 21:55 ` [PATCH v6 0/5] fuse: remove temp page copies in writeback Joanne Koong
2024-12-13 11:52 ` Miklos Szeredi
2024-12-13 16:47 ` Shakeel Butt
2024-12-18 17:37 ` Joanne Koong
2024-12-18 17:44 ` Shakeel Butt
2024-12-18 17:53 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ccefea7b-88a5-4472-94cd-1e320bf90b44@redhat.com \
--to=david@redhat.com \
--cc=bernd.schubert@fastmail.fm \
--cc=jefflexu@linux.alibaba.com \
--cc=jlayton@kernel.org \
--cc=joannelkoong@gmail.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=miklos@szeredi.hu \
--cc=osalvador@suse.de \
--cc=shakeel.butt@linux.dev \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).