From: Sean Christopherson <seanjc@google.com>
To: Ackerley Tng <ackerleytng@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Shuah Khan <shuah@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
rientjes@google.com, rick.p.edgecombe@intel.com,
yan.y.zhao@intel.com, fvdl@google.com, jthoughton@google.com,
vannapurve@google.com, shivankg@amd.com, michael.roth@amd.com,
pratyush@kernel.org, pasha.tatashin@soleen.com,
kalyazin@amazon.com, tabba@google.com,
Vlastimil Babka <vbabka@kernel.org>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH RFC v3 2/4] KVM: guest_memfd: Set release always on guest_memfd mappings
Date: Mon, 9 Mar 2026 18:06:17 -0700 [thread overview]
Message-ID: <aa9uiQ_KBcX7X0My@google.com> (raw)
In-Reply-To: <20260309-gmem-st-blocks-v3-2-815f03d9653e@google.com>
On Mon, Mar 09, 2026, Ackerley Tng wrote:
> Set release always on guest_memfd mappings to enable the use of
> .invalidate_folio, which performs inode accounting for guest_memfd.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
> virt/kvm/guest_memfd.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 77219551056a7..8246b9fbcf832 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -607,6 +607,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
> mapping_set_inaccessible(inode->i_mapping);
> /* Unmovable mappings are supposed to be marked unevictable as well. */
> WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
> + mapping_set_release_always(inode->i_mapping);
*sigh*
So... an internal AI review bot flagged setting AS_RELEASE_ALWAYS as being
potentially problematic, and I started poking around, mostly because I was
curious. I'm pretty sure the exact scenario painted by the bot isn't possible,
but I do think a similar issue exists in at least truncate_error_folio(). Or at
least, *should* exist, but doesn't because of a different bug.
On memory error, kvm_gmem_error_folio() will get invoked via this code. Note
the "err != 0" check. kvm_gmem_error_folio() returns MF_DELAYED, which has an
arbitrary value of '2', and so KVM is always signalling "failure".
int err = mapping->a_ops->error_remove_folio(mapping, folio);
if (err != 0)
pr_info("%#lx: Failed to punch page: %d\n", pfn, err);
else if (!filemap_release_folio(folio, GFP_NOIO))
pr_info("%#lx: failed to release buffers\n", pfn);
I _think_ that's bad? On x86, if I'm following the breadcrubs correctly, we'll
end up in this code in kill_me_maybe()
pr_err("Memory error not recovered");
kill_me_now(cb);
and send what I assume is a relatively useless SIGBUS and likely kill the VM.
struct task_struct *p = container_of(ch, struct task_struct, mce_kill_me);
p->mce_count = 0;
force_sig(SIGBUS);
But even if that's somehow the "right" behavior, we're doing it purely by
accident.
As for this patch, if we fix that bug by returning 0, then filemap_release_folio()
is definitely reachable by at least one flow, so I think guest_memfd also needs
to implement release_folio()?
Full AI bot text:
--
Setting the AS_RELEASE_ALWAYS flag causes folio_needs_release() to return
true. This correctly triggers .invalidate_folio during truncation, but does
it also unintentionally expose guest_memfd folios to eviction via
posix_fadvise(POSIX_FADV_DONTNEED)?
If userspace calls posix_fadvise() on a guest_memfd file, the core mm
calls mapping_evict_folio(). Because folio_needs_release() is true, it
calls filemap_release_folio().
Since guest_memfd does not implement a .release_folio address space
operation, filemap_release_folio() falls back to calling
try_to_free_buffers(). Could this fallback cause a warning?
fs/buffer.c:try_to_free_buffers() {
...
/* Misconfigured folio check */
if (WARN_ON_ONCE(!folio_buffers(folio)))
return true;
...
}
Because the guest_memfd folio has no private data, folio_buffers()
is NULL, which will trigger this WARN_ON_ONCE.
Furthermore, try_to_free_buffers() returns true, allowing the folio to be
removed from the page cache. Because this eviction path bypasses
truncate_cleanup_folio(), it never calls .invalidate_folio.
Does this mean inode_sub_bytes() is skipped, leaking the inode block
accounting?
Userspace could potentially trigger the warning and infinitely inflate the
inode's block count with:
struct kvm_create_guest_memfd args = { .size = 4096 };
int fd = ioctl(kvm_vm_fd, KVM_CREATE_GUEST_MEMFD, &args);
fallocate(fd, 0, 0, 4096);
posix_fadvise(fd, 0, 4096, POSIX_FADV_DONTNEED);
Should guest_memfd implement a .release_folio callback that simply
returns false to prevent these folios from being evicted?
--
next prev parent reply other threads:[~2026-03-10 1:06 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-09 9:53 [PATCH RFC v3 0/4] guest_memfd: Track amount of memory allocated on inode Ackerley Tng
2026-03-09 9:53 ` [PATCH RFC v3 1/4] KVM: " Ackerley Tng
2026-03-09 11:50 ` David Hildenbrand (Arm)
2026-03-09 15:45 ` Ackerley Tng
2026-03-09 20:14 ` Sean Christopherson
2026-03-09 9:53 ` [PATCH RFC v3 2/4] KVM: guest_memfd: Set release always on guest_memfd mappings Ackerley Tng
2026-03-09 11:42 ` Jan Kara
2026-03-10 1:06 ` Sean Christopherson [this message]
2026-03-10 9:12 ` Ackerley Tng
2026-03-12 19:00 ` Sean Christopherson
2026-03-09 9:53 ` [PATCH RFC v3 3/4] KVM: selftests: Wrap fstat() to assert success Ackerley Tng
2026-03-09 9:53 ` [PATCH RFC v3 4/4] KVM: selftests: Test that st_blocks is updated on allocation Ackerley Tng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa9uiQ_KBcX7X0My@google.com \
--to=seanjc@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=fvdl@google.com \
--cc=jack@suse.cz \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=michael.roth@amd.com \
--cc=pasha.tatashin@soleen.com \
--cc=pbonzini@redhat.com \
--cc=pratyush@kernel.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=shivankg@amd.com \
--cc=shuah@kernel.org \
--cc=surenb@google.com \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
--cc=vbabka@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox