Re: [PATCH 1/2] KVM: guest_memfd: Always use order 0 when allocating for guest_memfd

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ackerley Tng <ackerleytng@google.com>
To: Deepanshu Kartikey <kartikey406@gmail.com>
Cc: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com,
	 linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com
Subject: Re: [PATCH 1/2] KVM: guest_memfd: Always use order 0 when allocating for guest_memfd
Date: Wed, 04 Feb 2026 08:30:46 -0800	[thread overview]
Message-ID: <diqza4xo31zd.fsf@google.com> (raw)
In-Reply-To: <CADhLXY4Dbe=rrD5z8uGd7kQ8v6WDrKaYsOeM=QPEN5g55YX-2w@mail.gmail.com> (message from Deepanshu Kartikey on Wed, 4 Feb 2026 18:06:25 +0530)

Deepanshu Kartikey <kartikey406@gmail.com> writes:

> On Wed, Feb 4, 2026 at 4:21 AM Ackerley Tng <ackerleytng@google.com> wrote:
>>
>> #syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next
>>
>> filemap_{grab,get}_folio() and related functions, used since the early
>> stages of guest_memfd have determined the order of the folio to be
>> allocated by looking up mapping_min_folio_order(mapping). As identified by
>> syzbot, MADV_HUGEPAGE can be used to set the result of
>> mapping_min_folio_order() to a value greater than 0, leading to the

I was wrong here, MADV_HUGEPAGE does not actually update mapping->flags
AFAICT, so it doesn't update the result of mapping_min_folio_order().

MADV_HUGEPAGE only operates on the VMA and doesn't update the mapping's
min or max order, which is a inode/mapping property.

>> allocation of a huge page and subsequent WARNing.
>>
>> Refactor the allocation code of guest_memfd to directly use
>> filemap_add_folio(), specifying an order of 0.
>>

This refactoring is not actually required, since IIUC guest_memfd never
tries to update mapping->flags, and so mapping_min_folio_order() and
mapping_max_folio_order() return the default of 0.

>> This refactoring replaces the original functionality where FGP_LOCK and
>> FGP_CREAT are requested. Opportunistically drop functionality provided by
>> FGP_ACCESSED. guest_memfd folios don't care about accessed flags because
>> guest_memfd memory is unevictable and there is no storage to write back to.
>>
>> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
>> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>> ---
>>  virt/kvm/guest_memfd.c | 20 ++++++++++++++++----
>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index fdaea3422c30..0c58f6aa5609 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -135,23 +135,35 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
>>         /* TODO: Support huge pages. */
>>         struct mempolicy *policy;
>>         struct folio *folio;
>> +       gfp_t gfp;
>> +       int ret;
>>
>>         /*
>>          * Fast-path: See if folio is already present in mapping to avoid
>>          * policy_lookup.
>>          */
>> +repeat:
>>         folio = __filemap_get_folio(inode->i_mapping, index,
>>                                     FGP_LOCK | FGP_ACCESSED, 0);
>>         if (!IS_ERR(folio))
>>                 return folio;
>>
>> +       gfp = mapping_gfp_mask(inode->i_mapping);
>> +
>>         policy = mpol_shared_policy_lookup(&GMEM_I(inode)->policy, index);
>> -       folio = __filemap_get_folio_mpol(inode->i_mapping, index,
>> -                                        FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
>> -                                        mapping_gfp_mask(inode->i_mapping), policy);
>> +       folio = filemap_alloc_folio(gfp, 0, policy);
>>         mpol_cond_put(policy);
>
> Hi Ackerley,
>
> Thanks for working on this bug! I've been investigating the same issue

Thank you for working on this bug too!

> and have a concern about the fast-path in your patch.
>
> In kvm_gmem_get_folio(), the fast-path returns any existing folio from
> the page cache without checking if it's a large folio:
>
> folio = __filemap_get_folio(inode->i_mapping, index,
>     FGP_LOCK | FGP_ACCESSED, 0);
> if (!IS_ERR(folio))
> return folio;  // <-- No size check here
>
> This means if a large folio was previously allocated (e.g., via

This is true, but I tried the above patch because back then I believed
that the filemap_add_folio() within the original
__filemap_get_folio_mpol() was the only place where folios get added to
the filemap.

I'm trying out another patch locally to disable khugepaged. I believe
the issue is that when MADV_HUGEPAGE is used on a guest_memfd vma, it
indirectly enabled khugepaged to work on guest_memfd folios, which we
don't want anyway.

I'm guessing now that the root cause is to disable khugepaged for
 guest_memfd, and I will be trying out a few options through the rest of
 today. My first thought was to set VM_NO_KHUGEPAGED (semantically
 suitable), but looks like it's triggering some hugetlb-related
 weirdness. I'm going to try VM_DONTEXPAND next.

Other notes: I trimmed the repro down by disabling calls 4, 5, 6, 9,
those are not necessary for repro. Calls 0, 1, 2, 3 is part of a regular
usage pattern of guest_memfd, so the uncommon usages are call 7
(MADV_HUGEPAGE) (likely culprit), or call 8. I believe call 8 is just
the trigger, since mlock() actually faults in the page to userspace
through gup.

> madvise(MADV_HUGEPAGE)), subsequent faults will find and return it

madvise(MADV_HUGEPAGE) does not allocate folios, it only marks the VMA
to allow huge pages

> from the fast-path, still triggering the WARN_ON_ONCE at line 416 in
> kvm_gmem_fault_user_mapping().
>
> The issue is that while your patch prevents *new* large folio
> allocations by hardcoding order=0 in filemap_alloc_folio(), it doesn't
> handle large folios that already exist in the page cache.
>
> Shouldn't we add a check for folio_test_large() on both the fast-path
> and slow-path to ensure we reject large folios regardless of how they
> were allocated? Something like:
>
> folio = __filemap_get_folio(...);
> if (!IS_ERR(folio))
> goto check_folio;
> // ... allocation code ...
> check_folio:
> if (folio_test_large(folio)) {
> folio_unlock(folio);
> folio_put(folio);
> return ERR_PTR(-E2BIG);

I saw your patch, I feel that returning -E2BIG doesn't address the root
cause of the issue.

> }
>
> Or am I missing something about how the page cache handles this case?
>
> Thanks,
> Deepanshu Kartikey

     prev parent reply	other threads:[~2026-02-04 16:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1770148108.git.ackerleytng@google.com>
2026-02-03 22:50 ` [PATCH 1/2] KVM: guest_memfd: Always use order 0 when allocating for guest_memfd Ackerley Tng
2026-02-03 22:00   ` [syzbot] [kvm?] WARNING in kvm_gmem_fault_user_mapping syzbot
2026-02-04 12:36   ` [PATCH 1/2] KVM: guest_memfd: Always use order 0 when allocating for guest_memfd Deepanshu Kartikey
2026-02-04 16:30     ` Ackerley Tng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=diqza4xo31zd.fsf@google.com \
    --to=ackerleytng@google.com \
    --cc=kartikey406@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox