From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D72CD293C44 for ; Wed, 4 Feb 2026 16:30:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770222649; cv=none; b=UxlhjDHY6rU4EkZBHEQcpqqYxm9UT91RK5x6x/tqXkjvBlibPQTdME0vcmn2ZEVkNTx10Kly4G56iwV0zBOlTknzJ1DVivTfSg7XONMUk/SzGLozj3SngWjNKKbD5AxR/FMLMwbgbegZ5sz+J8FxaWlakH5tYieq/4w5uERDMdM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770222649; c=relaxed/simple; bh=utH8jZkcozjcaJjTWBIiCObJrP+wXoXdz8ZTgcS11s4=; h=Date:In-Reply-To:Mime-Version:Message-ID:Subject:From:To:Cc: Content-Type; b=kqVUuzBpD9SR4kMh19Al81CKi3ruB3l0h09PsKrHpbAKJLxqj/6gHrnCUkx2b3H3VEG019v8RDNAPxVfI+b0DFlkZhNEFWFYnxmVRhRi6RckDCuIidHyrcx98aZCtce4YSq29pudIRWXyJg5LE5ffnB4U5Wn4FcAWhq4rOVv0Ns= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=WCGgLCIW; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="WCGgLCIW" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2a8f8c81d02so43935ad.2 for ; Wed, 04 Feb 2026 08:30:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770222648; x=1770827448; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=7kdjK3AkEHCDS0D10cUyXAAygA0rnCMp5KlysK9zW1Q=; b=WCGgLCIWeHbZCIMgC4efv4KAvGg/6B7NZNKuvfFf3q+72SD+uUpsA/GGORhyzXrwDo xE6NfuM9UOcs9iWfaKC1RioThHiCkd1v/RydSlDoFku2FY7jT1CBvjkaMO5NotzZ6sDJ 04LbUUhrA4YXS/s9WDyaPJgs6+5ydmvv+62O0IDwaCvprZhBe6TnLUToH94YCJn4q/sq 53x3UPQ7HRtMoAeJlfnlqe7wUPZb0hnf9tNuky73xp8acRVFySUuQwv/K1kpqmEx196l 0Hpj4dwMC2sYmrNSWqUqX/1Q4HL7Q0YOMXPUwuXF2kKfpYE+sVKO+Y4uX2wXpxVwSzLW uu+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770222648; x=1770827448; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=7kdjK3AkEHCDS0D10cUyXAAygA0rnCMp5KlysK9zW1Q=; b=ci4sJqF4Y0ywd5AVtvyZRyQan3j/VvbRH6OwBEez6dCKQs0Siz+sTAMloP9Uy1hE0h uhYG4yp+5XFevnevEBr1s45+2wAzBx6VAy5dBTQhDhgoMgA/G4Za0u8OJz2XcH0NRdDE Nx37Xru7EDjUeJydb160XwgNUE1EInSmoTHTuC4nMpObcdcPbLQ5XiKgMeUj111bBSly 3LrlhSHrVs1SguGneMW21IrblFebd0jxP+sl3+1pVwGazRRB61hLsmbL9eBlpBXClA94 nQzhCGpTN10+0HFsGhqn/zLl7uSKH9a8eXC7kck0W3osHbb3oosATOpPl72n0vKiv5sw J/ew== X-Forwarded-Encrypted: i=1; AJvYcCVlArUeUUMwvl5HhLpngLgp18GdvKb1JRY7SXlrK/mMaD7ewyVMD46NrgSWkNAhDonOFCD+Xa4HGxdfNww=@vger.kernel.org X-Gm-Message-State: AOJu0Yys4AV0AP4cjvBPsnLtS+PR7OgusJfjjObEFCKRwLDXTn+s5zIz uyzIueaphyWthBdBhk5FoN8fm0uCw9SIViCt7V+ES18HCUiZVr6qc9XifC3GQe/JiYdGYWam3hw Mo4kXJzepk4GXYsjsj2obSehq+Q== X-Received: from pjqo23.prod.google.com ([2002:a17:90a:ac17:b0:34a:a5cf:dcfd]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4a43:b0:352:c9c9:75b8 with SMTP id 98e67ed59e1d1-354871f7f20mr2663860a91.36.1770222648081; Wed, 04 Feb 2026 08:30:48 -0800 (PST) Date: Wed, 04 Feb 2026 08:30:46 -0800 In-Reply-To: (message from Deepanshu Kartikey on Wed, 4 Feb 2026 18:06:25 +0530) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Message-ID: Subject: Re: [PATCH 1/2] KVM: guest_memfd: Always use order 0 when allocating for guest_memfd From: Ackerley Tng To: Deepanshu Kartikey Cc: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com, linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Deepanshu Kartikey writes: > On Wed, Feb 4, 2026 at 4:21=E2=80=AFAM Ackerley Tng wrote: >> >> #syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next >> >> filemap_{grab,get}_folio() and related functions, used since the early >> stages of guest_memfd have determined the order of the folio to be >> allocated by looking up mapping_min_folio_order(mapping). As identified = by >> syzbot, MADV_HUGEPAGE can be used to set the result of >> mapping_min_folio_order() to a value greater than 0, leading to the I was wrong here, MADV_HUGEPAGE does not actually update mapping->flags AFAICT, so it doesn't update the result of mapping_min_folio_order(). MADV_HUGEPAGE only operates on the VMA and doesn't update the mapping's min or max order, which is a inode/mapping property. >> allocation of a huge page and subsequent WARNing. >> >> Refactor the allocation code of guest_memfd to directly use >> filemap_add_folio(), specifying an order of 0. >> This refactoring is not actually required, since IIUC guest_memfd never tries to update mapping->flags, and so mapping_min_folio_order() and mapping_max_folio_order() return the default of 0. >> This refactoring replaces the original functionality where FGP_LOCK and >> FGP_CREAT are requested. Opportunistically drop functionality provided b= y >> FGP_ACCESSED. guest_memfd folios don't care about accessed flags because >> guest_memfd memory is unevictable and there is no storage to write back = to. >> >> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com >> Closes: https://syzkaller.appspot.com/bug?extid=3D33a04338019ac7e43a44 >> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com >> Signed-off-by: Ackerley Tng >> --- >> virt/kvm/guest_memfd.c | 20 ++++++++++++++++---- >> 1 file changed, 16 insertions(+), 4 deletions(-) >> >> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c >> index fdaea3422c30..0c58f6aa5609 100644 >> --- a/virt/kvm/guest_memfd.c >> +++ b/virt/kvm/guest_memfd.c >> @@ -135,23 +135,35 @@ static struct folio *kvm_gmem_get_folio(struct ino= de *inode, pgoff_t index) >> /* TODO: Support huge pages. */ >> struct mempolicy *policy; >> struct folio *folio; >> + gfp_t gfp; >> + int ret; >> >> /* >> * Fast-path: See if folio is already present in mapping to avoi= d >> * policy_lookup. >> */ >> +repeat: >> folio =3D __filemap_get_folio(inode->i_mapping, index, >> FGP_LOCK | FGP_ACCESSED, 0); >> if (!IS_ERR(folio)) >> return folio; >> >> + gfp =3D mapping_gfp_mask(inode->i_mapping); >> + >> policy =3D mpol_shared_policy_lookup(&GMEM_I(inode)->policy, ind= ex); >> - folio =3D __filemap_get_folio_mpol(inode->i_mapping, index, >> - FGP_LOCK | FGP_ACCESSED | FGP_C= REAT, >> - mapping_gfp_mask(inode->i_mappi= ng), policy); >> + folio =3D filemap_alloc_folio(gfp, 0, policy); >> mpol_cond_put(policy); > > Hi Ackerley, > > Thanks for working on this bug! I've been investigating the same issue Thank you for working on this bug too! > and have a concern about the fast-path in your patch. > > In kvm_gmem_get_folio(), the fast-path returns any existing folio from > the page cache without checking if it's a large folio: > > folio =3D __filemap_get_folio(inode->i_mapping, index, > FGP_LOCK | FGP_ACCESSED, 0); > if (!IS_ERR(folio)) > return folio; // <-- No size check here > > This means if a large folio was previously allocated (e.g., via This is true, but I tried the above patch because back then I believed that the filemap_add_folio() within the original __filemap_get_folio_mpol() was the only place where folios get added to the filemap. I'm trying out another patch locally to disable khugepaged. I believe the issue is that when MADV_HUGEPAGE is used on a guest_memfd vma, it indirectly enabled khugepaged to work on guest_memfd folios, which we don't want anyway. I'm guessing now that the root cause is to disable khugepaged for guest_memfd, and I will be trying out a few options through the rest of today. My first thought was to set VM_NO_KHUGEPAGED (semantically suitable), but looks like it's triggering some hugetlb-related weirdness. I'm going to try VM_DONTEXPAND next. Other notes: I trimmed the repro down by disabling calls 4, 5, 6, 9, those are not necessary for repro. Calls 0, 1, 2, 3 is part of a regular usage pattern of guest_memfd, so the uncommon usages are call 7 (MADV_HUGEPAGE) (likely culprit), or call 8. I believe call 8 is just the trigger, since mlock() actually faults in the page to userspace through gup. > madvise(MADV_HUGEPAGE)), subsequent faults will find and return it madvise(MADV_HUGEPAGE) does not allocate folios, it only marks the VMA to allow huge pages > from the fast-path, still triggering the WARN_ON_ONCE at line 416 in > kvm_gmem_fault_user_mapping(). > > The issue is that while your patch prevents *new* large folio > allocations by hardcoding order=3D0 in filemap_alloc_folio(), it doesn't > handle large folios that already exist in the page cache. > > Shouldn't we add a check for folio_test_large() on both the fast-path > and slow-path to ensure we reject large folios regardless of how they > were allocated? Something like: > > folio =3D __filemap_get_folio(...); > if (!IS_ERR(folio)) > goto check_folio; > // ... allocation code ... > check_folio: > if (folio_test_large(folio)) { > folio_unlock(folio); > folio_put(folio); > return ERR_PTR(-E2BIG); I saw your patch, I feel that returning -E2BIG doesn't address the root cause of the issue. > } > > Or am I missing something about how the page cache handles this case? > > Thanks, > Deepanshu Kartikey