Re: [PATCH 1/2] KVM: guest_memfd: Always use order 0 when allocating for guest_memfd

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ackerley Tng <ackerleytng@google.com>
To: Deepanshu Kartikey <kartikey406@gmail.com>
Cc: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com,
	 linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com
Subject: Re: [PATCH 1/2] KVM: guest_memfd: Always use order 0 when allocating for guest_memfd
Date: Wed, 04 Feb 2026 08:30:46 -0800	[thread overview]
Message-ID: <diqza4xo31zd.fsf@google.com> (raw)
In-Reply-To: <CADhLXY4Dbe=rrD5z8uGd7kQ8v6WDrKaYsOeM=QPEN5g55YX-2w@mail.gmail.com> (message from Deepanshu Kartikey on Wed, 4 Feb 2026 18:06:25 +0530)

Deepanshu Kartikey <kartikey406@gmail.com> writes:

> On Wed, Feb 4, 2026 at 4:21 AM Ackerley Tng <ackerleytng@google.com> wrote:
>>
>> #syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next
>>
>> filemap_{grab,get}_folio() and related functions, used since the early
>> stages of guest_memfd have determined the order of the folio to be
>> allocated by looking up mapping_min_folio_order(mapping). As identified by
>> syzbot, MADV_HUGEPAGE can be used to set the result of
>> mapping_min_folio_order() to a value greater than 0, leading to the

I was wrong here, MADV_HUGEPAGE does not actually update mapping->flags
AFAICT, so it doesn't update the result of mapping_min_folio_order().

MADV_HUGEPAGE only operates on the VMA and doesn't update the mapping's
min or max order, which is a inode/mapping property.

>> allocation of a huge page and subsequent WARNing.
>>
>> Refactor the allocation code of guest_memfd to directly use
>> filemap_add_folio(), specifying an order of 0.
>>

This refactoring is not actually required, since IIUC guest_memfd never
tries to update mapping->flags, and so mapping_min_folio_order() and
mapping_max_folio_order() return the default of 0.

>> This refactoring replaces the original functionality where FGP_LOCK and
>> FGP_CREAT are requested. Opportunistically drop functionality provided by
>> FGP_ACCESSED. guest_memfd folios don't care about accessed flags because
>> guest_memfd memory is unevictable and there is no storage to write back to.
>>
>> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
>> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>> ---
>>  virt/kvm/guest_memfd.c | 20 ++++++++++++++++----
>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index fdaea3422c30..0c58f6aa5609 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -135,23 +135,35 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
>>         /* TODO: Support huge pages. */
>>         struct mempolicy *policy;
>>         struct folio *folio;
>> +       gfp_t gfp;
>> +       int ret;
>>
>>         /*
>>          * Fast-path: See if folio is already present in mapping to avoid
>>          * policy_lookup.
>>          */
>> +repeat:
>>         folio = __filemap_get_folio(inode->i_mapping, index,
>>                                     FGP_LOCK | FGP_ACCESSED, 0);
>>         if (!IS_ERR(folio))
>>                 return folio;
>>
>> +       gfp = mapping_gfp_mask(inode->i_mapping);
>> +
>>         policy = mpol_shared_policy_lookup(&GMEM_I(inode)->policy, index);
>> -       folio = __filemap_get_folio_mpol(inode->i_mapping, index,
>> -                                        FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
>> -                                        mapping_gfp_mask(inode->i_mapping), policy);
>> +       folio = filemap_alloc_folio(gfp, 0, policy);
>>         mpol_cond_put(policy);
>
> Hi Ackerley,
>
> Thanks for working on this bug! I've been investigating the same issue

Thank you for working on this bug too!

> and have a concern about the fast-path in your patch.
>
> In kvm_gmem_get_folio(), the fast-path returns any existing folio from
> the page cache without checking if it's a large folio:
>
> folio = __filemap_get_folio(inode->i_mapping, index,
>     FGP_LOCK | FGP_ACCESSED, 0);
> if (!IS_ERR(folio))
> return folio;  // <-- No size check here
>
> This means if a large folio was previously allocated (e.g., via

This is true, but I tried the above patch because back then I believed
that the filemap_add_folio() within the original
__filemap_get_folio_mpol() was the only place where folios get added to
the filemap.

I'm trying out another patch locally to disable khugepaged. I believe
the issue is that when MADV_HUGEPAGE is used on a guest_memfd vma, it
indirectly enabled khugepaged to work on guest_memfd folios, which we
don't want anyway.

I'm guessing now that the root cause is to disable khugepaged for
 guest_memfd, and I will be trying out a few options through the rest of
 today. My first thought was to set VM_NO_KHUGEPAGED (semantically
 suitable), but looks like it's triggering some hugetlb-related
 weirdness. I'm going to try VM_DONTEXPAND next.

Other notes: I trimmed the repro down by disabling calls 4, 5, 6, 9,
those are not necessary for repro. Calls 0, 1, 2, 3 is part of a regular
usage pattern of guest_memfd, so the uncommon usages are call 7
(MADV_HUGEPAGE) (likely culprit), or call 8. I believe call 8 is just
the trigger, since mlock() actually faults in the page to userspace
through gup.

> madvise(MADV_HUGEPAGE)), subsequent faults will find and return it

madvise(MADV_HUGEPAGE) does not allocate folios, it only marks the VMA
to allow huge pages

> from the fast-path, still triggering the WARN_ON_ONCE at line 416 in
> kvm_gmem_fault_user_mapping().
>
> The issue is that while your patch prevents *new* large folio
> allocations by hardcoding order=0 in filemap_alloc_folio(), it doesn't
> handle large folios that already exist in the page cache.
>
> Shouldn't we add a check for folio_test_large() on both the fast-path
> and slow-path to ensure we reject large folios regardless of how they
> were allocated? Something like:
>
> folio = __filemap_get_folio(...);
> if (!IS_ERR(folio))
> goto check_folio;
> // ... allocation code ...
> check_folio:
> if (folio_test_large(folio)) {
> folio_unlock(folio);
> folio_put(folio);
> return ERR_PTR(-E2BIG);

I saw your patch, I feel that returning -E2BIG doesn't address the root
cause of the issue.

> }
>
> Or am I missing something about how the page cache handles this case?
>
> Thanks,
> Deepanshu Kartikey

     prev parent reply	other threads:[~2026-02-04 16:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1770148108.git.ackerleytng@google.com>
2026-02-03 22:50 ` [PATCH 1/2] KVM: guest_memfd: Always use order 0 when allocating for guest_memfd Ackerley Tng
2026-02-03 22:00   ` [syzbot] [kvm?] WARNING in kvm_gmem_fault_user_mapping syzbot
2026-02-04 12:36   ` [PATCH 1/2] KVM: guest_memfd: Always use order 0 when allocating for guest_memfd Deepanshu Kartikey
2026-02-04 16:30     ` Ackerley Tng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=diqza4xo31zd.fsf@google.com \
    --to=ackerleytng@google.com \
    --cc=kartikey406@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.