linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Jingbo Xu <jefflexu@linux.alibaba.com>,
	miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	shakeel.butt@linux.dev, josef@toxicpanda.com,
	bernd.schubert@fastmail.fm, linux-mm@kvack.org,
	kernel-team@meta.com, Matthew Wilcox <willy@infradead.org>,
	Zi Yan <ziy@nvidia.com>, Oscar Salvador <osalvador@suse.de>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings
Date: Thu, 3 Apr 2025 22:44:33 +0200	[thread overview]
Message-ID: <075209ac-c659-485e-a220-83d4afed8a94@redhat.com> (raw)
In-Reply-To: <CAJnrk1a7DAijj09VQxJ1rjppgh=FMCm30cN_=wQijrz4B4nUtQ@mail.gmail.com>

On 03.04.25 21:09, Joanne Koong wrote:
> On Thu, Apr 3, 2025 at 2:18 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 03.04.25 05:31, Jingbo Xu wrote:
>>>
>>>
>>> On 4/3/25 5:34 AM, Joanne Koong wrote:
>>>> On Thu, Dec 19, 2024 at 5:05 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>> On 23.11.24 00:23, Joanne Koong wrote:
>>>>>> For migrations called in MIGRATE_SYNC mode, skip migrating the folio if
>>>>>> it is under writeback and has the AS_WRITEBACK_INDETERMINATE flag set on its
>>>>>> mapping. If the AS_WRITEBACK_INDETERMINATE flag is set on the mapping, the
>>>>>> writeback may take an indeterminate amount of time to complete, and
>>>>>> waits may get stuck.
>>>>>>
>>>>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>>>>> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
>>>>>> ---
>>>>>>     mm/migrate.c | 5 ++++-
>>>>>>     1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>> index df91248755e4..fe73284e5246 100644
>>>>>> --- a/mm/migrate.c
>>>>>> +++ b/mm/migrate.c
>>>>>> @@ -1260,7 +1260,10 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>>>>>>                  */
>>>>>>                 switch (mode) {
>>>>>>                 case MIGRATE_SYNC:
>>>>>> -                     break;
>>>>>> +                     if (!src->mapping ||
>>>>>> +                         !mapping_writeback_indeterminate(src->mapping))
>>>>>> +                             break;
>>>>>> +                     fallthrough;
>>>>>>                 default:
>>>>>>                         rc = -EBUSY;
>>>>>>                         goto out;
>>>>>
>>>>> Ehm, doesn't this mean that any fuse user can essentially completely
>>>>> block CMA allocations, memory compaction, memory hotunplug, memory
>>>>> poisoning... ?!
>>>>>
>>>>> That sounds very bad.
>>>>
>>>> I took a closer look at the migration code and the FUSE code. In the
>>>> migration code in migrate_folio_unmap(), I see that any MIGATE_SYNC
>>>> mode folio lock holds will block migration until that folio is
>>>> unlocked. This is the snippet in migrate_folio_unmap() I'm looking at:
>>>>
>>>>           if (!folio_trylock(src)) {
>>>>                   if (mode == MIGRATE_ASYNC)
>>>>                           goto out;
>>>>
>>>>                   if (current->flags & PF_MEMALLOC)
>>>>                           goto out;
>>>>
>>>>                   if (mode == MIGRATE_SYNC_LIGHT && !folio_test_uptodate(src))
>>>>                           goto out;
>>>>
>>>>                   folio_lock(src);
>>>>           }
>>>>
>>
>> Right, I raised that also in my LSF/MM talk: waiting for readahead
>> currently implies waiting for the folio lock (there is no separate
>> readahead flag like there would be for writeback).
>>
>> The more I look into this and fuse, the more I realize that what fuse
>> does is just completely broken right now.
>>
>>>> If this is all that is needed for a malicious FUSE server to block
>>>> migration, then it makes no difference if AS_WRITEBACK_INDETERMINATE
>>>> mappings are skipped in migration. A malicious server has easier and
>>>> more powerful ways of blocking migration in FUSE than trying to do it
>>>> through writeback. For a malicious fuse server, we in fact wouldn't
>>>> even get far enough to hit writeback - a write triggers
>>>> aops->write_begin() and a malicious server would deliberately hang
>>>> forever while the folio is locked in write_begin().
>>>
>>> Indeed it seems possible.  A malicious FUSE server may already be
>>> capable of blocking the synchronous migration in this way.
>>
>> Yes, I think the conclusion is that we should advise people from not
>> using unprivileged FUSE if they care about any features that rely on
>> page migration or page reclaim.
>>
>>>
>>>
>>>>
>>>> I looked into whether we could eradicate all the places in FUSE where
>>>> we may hold the folio lock for an indeterminate amount of time,
>>>> because if that is possible, then we should not add this writeback way
>>>> for a malicious fuse server to affect migration. But I don't think we
>>>> can, for example taking one case, the folio lock needs to be held as
>>>> we read in the folio from the server when servicing page faults, else
>>>> the page cache would contain stale data if there was a concurrent
>>>> write that happened just before, which would lead to data corruption
>>>> in the filesystem. Imo, we need a more encompassing solution for all
>>>> these cases if we're serious about preventing FUSE from blocking
>>>> migration, which probably looks like a globally enforced default
>>>> timeout of some sort or an mm solution for mitigating the blast radius
>>>> of how much memory can be blocked from migration, but that is outside
>>>> the scope of this patchset and is its own standalone topic.
>>
>> I'm still skeptical about timeouts: we can only get it wrong.
>>
>> I think a proper solution is making these pages movable, which does seem
>> feasible if (a) splice is not involved and (b) we can find a way to not
>> hold the folio lock forever e.g., in the readahead case.
>>
>> Maybe readahead would have to be handled more similar to writeback
>> (e.g., having a separate flag, or using a combination of e.g.,
>> writeback+uptodate flag, not sure)
>>
>> In both cases (readahead+writeback), we'd want to call into the FS to
>> migrate a folio that is under readahread/writeback. In case of fuse
>> without splice, a migration might be doable, and as discussed, splice
>> might just be avoided.
>>
>>>>
>>>> I don't see how this patch has any additional negative impact on
>>>> memory migration for the case of malicious servers that the server
>>>> can't already (and more easily) do. In fact, this patchset if anything
>>>> helps memory given that malicious servers now can't also trigger page
>>>> allocations for temp pages that would never get freed.
>>>>
>>>
>>> If that's true, maybe we could drop this patch out of this patchset? So
>>> that both before and after this patchset, synchronous migration could be
>>> blocked by a malicious FUSE server, while the usability of continuous
>>> memory (CMA) won't be affected.
>>
>> I had exactly the same thought: if we can block forever on the folio
>> lock, there is no need for AS_WRITEBACK_INDETERMINATE. It's already all
>> completely broken.
> 
> I will resubmit this patchset and drop this patch.
> 
> I think we still need AS_WRITEBACK_INDETERMINATE for sync and legacy
> cgroupv1 reclaim scenarios:
> a) sync: sync waits on writeback so if we don't skip waiting on
> writeback for AS_WRITEBACK_INDETERMINATE mappings, then malicious fuse
> servers could make syncs hang. (There's no actual effect on sync
> behavior though with temp pages because even without temp pages, we
> return even though the data hasn't actually been synced to disk by the
> server yet)

Just curious: Are we sure there are no other cases where a malicious 
userspace could make some other folio_lock() hang forever either way?

IOW, just like for migration, isn't this just solving one part of the 
whole problem we are facing?

> 
> b) cgroupv1 reclaim: a correctly written fuse server can fall into
> this deadlock in one very specific scenario (eg  if it's using legacy
> cgroupv1 and reclaim encounters a folio that already has the reclaim
> flag set and the caller didn't have __GFP_FS (or __GFP_IO if swap)
> set), where the deadlock is triggered by:
> * single-threaded FUSE server is in the middle of handling a request
> that needs a memory allocation
> * memory allocation triggers direct reclaim
> * direct reclaim waits on a folio under writeback
> * the FUSE server can't write back the folio since it's stuck in direct reclaim

Yes, that sounds reasonable.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-04-03 20:44 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-22 23:23 [PATCH v6 0/5] fuse: remove temp page copies in writeback Joanne Koong
2024-11-22 23:23 ` [PATCH v6 1/5] mm: add AS_WRITEBACK_INDETERMINATE mapping flag Joanne Koong
2024-11-22 23:23 ` [PATCH v6 2/5] mm: skip reclaiming folios in legacy memcg writeback indeterminate contexts Joanne Koong
2024-11-22 23:23 ` [PATCH v6 3/5] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_INDETERMINATE mappings Joanne Koong
2024-11-22 23:23 ` [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with " Joanne Koong
2024-12-19 13:05   ` David Hildenbrand
2024-12-19 14:19     ` Zi Yan
2024-12-19 15:08       ` Zi Yan
2024-12-19 15:39         ` David Hildenbrand
2024-12-19 15:47           ` Zi Yan
2024-12-19 15:50             ` David Hildenbrand
2024-12-19 15:43     ` Shakeel Butt
2024-12-19 15:47       ` David Hildenbrand
2024-12-19 15:53         ` Shakeel Butt
2024-12-19 15:55           ` Zi Yan
2024-12-19 15:56             ` Bernd Schubert
2024-12-19 16:00               ` Zi Yan
2024-12-19 16:02                 ` Zi Yan
2024-12-19 16:09                   ` Bernd Schubert
2024-12-19 16:14                     ` Zi Yan
2024-12-19 16:26                       ` Shakeel Butt
2024-12-19 16:31                         ` David Hildenbrand
2024-12-19 16:53                           ` Shakeel Butt
2024-12-19 16:22             ` Shakeel Butt
2024-12-19 16:29               ` David Hildenbrand
2024-12-19 16:40                 ` Shakeel Butt
2024-12-19 16:41                   ` David Hildenbrand
2024-12-19 17:14                     ` Shakeel Butt
2024-12-19 17:26                       ` David Hildenbrand
2024-12-19 17:30                         ` Bernd Schubert
2024-12-19 17:37                           ` Shakeel Butt
2024-12-19 17:40                             ` Bernd Schubert
2024-12-19 17:44                             ` Joanne Koong
2024-12-19 17:54                               ` Shakeel Butt
2024-12-20 11:44                                 ` David Hildenbrand
2024-12-20 12:15                                   ` Bernd Schubert
2024-12-20 14:49                                     ` David Hildenbrand
2024-12-20 15:26                                       ` Bernd Schubert
2024-12-20 18:01                                       ` Shakeel Butt
2024-12-21  2:28                                         ` Jingbo Xu
2024-12-21 16:23                                           ` David Hildenbrand
2024-12-22  2:47                                             ` Jingbo Xu
2024-12-24 11:32                                               ` David Hildenbrand
2024-12-21 16:18                                         ` David Hildenbrand
2024-12-23 22:14                                           ` Shakeel Butt
2024-12-24 12:37                                             ` David Hildenbrand
2024-12-26 15:11                                               ` Zi Yan
2024-12-26 20:13                                               ` Shakeel Butt
2024-12-26 22:02                                                 ` Bernd Schubert
2024-12-27 20:08                                                 ` Joanne Koong
2024-12-27 20:32                                                   ` Bernd Schubert
2024-12-30 17:52                                                     ` Joanne Koong
2024-12-30 10:16                                                 ` David Hildenbrand
2024-12-30 18:38                                                   ` Joanne Koong
2024-12-30 19:52                                                     ` David Hildenbrand
2024-12-30 20:11                                                       ` Shakeel Butt
2025-01-02 18:54                                                         ` Joanne Koong
2025-01-03 20:31                                                           ` David Hildenbrand
2025-01-06 10:19                                                             ` Miklos Szeredi
2025-01-06 18:17                                                               ` Shakeel Butt
2025-01-07  8:34                                                                 ` David Hildenbrand
2025-01-07 18:07                                                                   ` Shakeel Butt
2025-01-09 11:22                                                                     ` David Hildenbrand
2025-01-10 20:28                                                                       ` Jeff Layton
2025-01-10 21:13                                                                         ` David Hildenbrand
2025-01-10 22:00                                                                           ` Shakeel Butt
2025-01-13 15:27                                                                             ` David Hildenbrand
2025-01-13 21:44                                                                               ` Jeff Layton
2025-01-14  8:38                                                                                 ` Miklos Szeredi
2025-01-14  9:40                                                                                   ` Miklos Szeredi
2025-01-14  9:55                                                                                     ` Bernd Schubert
2025-01-14 10:07                                                                                       ` Miklos Szeredi
2025-01-14 18:07                                                                                         ` Joanne Koong
2025-01-14 18:58                                                                                           ` Miklos Szeredi
2025-01-14 19:12                                                                                             ` Joanne Koong
2025-01-14 20:00                                                                                               ` Miklos Szeredi
2025-01-14 20:29                                                                                               ` Jeff Layton
2025-01-14 21:40                                                                                                 ` Bernd Schubert
2025-01-23 16:06                                                                                                   ` Pavel Begunkov
2025-01-14 20:51                                                                                         ` Joanne Koong
2025-01-24 12:25                                                                                           ` David Hildenbrand
2025-01-14 15:49                                                                                     ` Jeff Layton
2025-01-24 12:29                                                                                       ` David Hildenbrand
2025-01-28 10:16                                                                                         ` Miklos Szeredi
2025-01-14 15:44                                                                                   ` Jeff Layton
2025-01-14 18:58                                                                                     ` Joanne Koong
2025-01-10 23:11                                                                           ` Jeff Layton
2025-01-10 20:16                                                                   ` Jeff Layton
2025-01-10 20:20                                                                     ` David Hildenbrand
2025-01-10 20:43                                                                       ` Jeff Layton
2025-01-10 21:00                                                                         ` David Hildenbrand
2025-01-10 21:07                                                                           ` Jeff Layton
2025-01-10 21:21                                                                             ` David Hildenbrand
2025-01-07 16:15                                                                 ` Miklos Szeredi
2025-01-08  1:40                                                                   ` Jingbo Xu
2024-12-30 20:04                                                     ` Shakeel Butt
2025-01-02 19:59                                                       ` Joanne Koong
2025-01-02 20:26                                                         ` Zi Yan
2024-12-20 21:01                                       ` Joanne Koong
2024-12-21 16:25                                         ` David Hildenbrand
2024-12-21 21:59                                           ` Bernd Schubert
2024-12-23 19:00                                             ` Joanne Koong
2024-12-26 22:44                                               ` Bernd Schubert
2024-12-27 18:25                                                 ` Joanne Koong
2024-12-19 17:55                         ` Joanne Koong
2024-12-19 18:04                           ` Bernd Schubert
2024-12-19 18:11                             ` Shakeel Butt
2024-12-20  7:55                     ` Jingbo Xu
2025-04-02 21:34     ` Joanne Koong
2025-04-03  3:31       ` Jingbo Xu
2025-04-03  9:18         ` David Hildenbrand
2025-04-03  9:25           ` Bernd Schubert
2025-04-03  9:35             ` Christian Brauner
2025-04-03 19:09           ` Joanne Koong
2025-04-03 20:44             ` David Hildenbrand [this message]
2025-04-03 22:04               ` Joanne Koong
2024-11-22 23:23 ` [PATCH v6 5/5] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2024-11-25  9:46   ` Jingbo Xu
2024-12-12 21:55 ` [PATCH v6 0/5] fuse: remove temp page copies in writeback Joanne Koong
2024-12-13 11:52 ` Miklos Szeredi
2024-12-13 16:47   ` Shakeel Butt
2024-12-18 17:37     ` Joanne Koong
2024-12-18 17:44       ` Shakeel Butt
2024-12-18 17:53         ` Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=075209ac-c659-485e-a220-83d4afed8a94@redhat.com \
    --to=david@redhat.com \
    --cc=bernd.schubert@fastmail.fm \
    --cc=jefflexu@linux.alibaba.com \
    --cc=joannelkoong@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@meta.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=osalvador@suse.de \
    --cc=shakeel.butt@linux.dev \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).