Re: [RFC PATCH v1 00/10] guest_memfd: Track amount of memory allocated on inode

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Ackerley Tng <ackerleytng@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Cc: akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com, mhocko@suse.com, willy@infradead.org,
	pbonzini@redhat.com, shuah@kernel.org, seanjc@google.com,
	shivankg@amd.com, rick.p.edgecombe@intel.com,
	yan.y.zhao@intel.com, rientjes@google.com, fvdl@google.com,
	jthoughton@google.com, vannapurve@google.com,
	pratyush@kernel.org, pasha.tatashin@soleen.com,
	kalyazin@amazon.com, tabba@google.com, michael.roth@amd.com
Subject: Re: [RFC PATCH v1 00/10] guest_memfd: Track amount of memory allocated on inode
Date: Tue, 24 Feb 2026 16:26:30 +0100	[thread overview]
Message-ID: <9ef9a0bd-4cff-4518-b7fb-e65c9b761a5a@kernel.org> (raw)
In-Reply-To: <CAEvNRgFF0+g9pmp1yitX48ebK=fDpYKSOQDmRfOjzSHxM5UpeQ@mail.gmail.com>

On 2/24/26 00:42, Ackerley Tng wrote:
> "David Hildenbrand (Arm)" <david@kernel.org> writes:
> 
>> On 2/23/26 08:04, Ackerley Tng wrote:
>>> Hi,
>>>
>>> Currently, guest_memfd doesn't update inode's i_blocks or i_bytes at
>>> all. Hence, st_blocks in the struct populated by a userspace fstat()
>>> call on a guest_memfd will always be 0. This patch series makes
>>> guest_memfd track the amount of memory allocated on an inode, which
>>> allows fstat() to accurately report that on requests from userspace.
>>>
>>> The inode's i_blocks and i_bytes fields are updated when the folio is
>>> associated or disassociated from the guest_memfd inode, which are at
>>> allocation and truncation times respectively.
>>>
>>> To update inode fields at truncation time, this series implements a
>>> custom truncation function for guest_memfd. An alternative would be to
>>> update truncate_inode_pages_range() to return the number of bytes
>>> truncated or add/use some hook.
>>>
>>> Implementing a custom truncation function was chosen to provide
>>> flexibility for handling truncations in future when guest_memfd
>>> supports sources of pages other than the buddy allocator. This
>>> approach of a custom truncation function also aligns with shmem, which
>>> has a custom shmem_truncate_range().
>>
>> Just wondered how shmem does it: it's through
>> dquot_alloc_block_nodirty() / dquot_free_block_nodirty().
>>
>> It's a shame we can't just use folio_free().
> 
> Yup, Hugh pointed out that struct address_space *mapping (and inode) may already
> have been freed by the time .free_folio() is called [1].
> 
> [1] https://lore.kernel.org/all/7c2677e1-daf7-3b49-0a04-1efdf451379a@google.com/
> 
>> Could we maybe have a
>> different callback (when the mapping is still guaranteed to be around)
>> from where we could update i_blocks on the freeing path?
> 
> Do you mean that we should add a new callback to struct
> address_space_operations?

If that avoids having to implement truncation completely ourselves, that might be one
option we could discuss, yes.

Something like:

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c753148af88..94f8bb81f017 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -764,6 +764,7 @@ cache in your filesystem.  The following members are defined:
                sector_t (*bmap)(struct address_space *, sector_t);
                void (*invalidate_folio) (struct folio *, size_t start, size_t len);
                bool (*release_folio)(struct folio *, gfp_t);
+               void (*remove_folio)(struct folio *folio);
                void (*free_folio)(struct folio *);
                ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
                int (*migrate_folio)(struct mapping *, struct folio *dst,
@@ -922,6 +923,11 @@ cache in your filesystem.  The following members are defined:
        its release_folio will need to ensure this.  Possibly it can
        clear the uptodate flag if it cannot free private data yet.
 
+``remove_folio``
+       remove_folio is called just before the folio is removed from the
+       page cache in order to allow the cleanup of properties (e.g.,
+       accounting) that needs the address_space mapping.
+
 ``free_folio``
        free_folio is called once the folio is no longer visible in the
        page cache in order to allow the cleanup of any private data.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8b3dd145b25e..f7f6930977a1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -422,6 +422,7 @@ struct address_space_operations {
        sector_t (*bmap)(struct address_space *, sector_t);
        void (*invalidate_folio) (struct folio *, size_t offset, size_t len);
        bool (*release_folio)(struct folio *, gfp_t);
+       void (*remove_folio)(struct folio *folio);
        void (*free_folio)(struct folio *folio);
        ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
        /*
diff --git a/mm/filemap.c b/mm/filemap.c
index 6cd7974d4ada..5a810eaacab2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -250,8 +250,14 @@ void filemap_free_folio(struct address_space *mapping, struct folio *folio)
 void filemap_remove_folio(struct folio *folio)
 {
        struct address_space *mapping = folio->mapping;
+       void (*remove_folio)(struct folio *);
 
        BUG_ON(!folio_test_locked(folio));
+
+       remove_folio = mapping->a_ops->remove_folio;
+       if (unlikely(remove_folio))
+               remove_folio(folio);
+
        spin_lock(&mapping->host->i_lock);
        xa_lock_irq(&mapping->i_pages);
        __filemap_remove_folio(folio, NULL);


Ideally we'd perform it under the lock just after clearing folio->mapping, but I guess that
might be more controversial.

For accounting you need the above might be good enough, but I am not sure for how many
other use cases there might be.

-- 
Cheers,

David

next prev parent reply	other threads:[~2026-02-24 15:26 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23  7:04 [RFC PATCH v1 00/10] guest_memfd: Track amount of memory allocated on inode Ackerley Tng
2026-02-23  7:04 ` [RFC PATCH v1 01/10] KVM: guest_memfd: Don't set FGP_ACCESSED when getting folios Ackerley Tng
2026-02-23  7:04 ` [RFC PATCH v1 02/10] KVM: guest_memfd: Directly allocate folios with filemap_alloc_folio() Ackerley Tng
2026-02-23  7:04 ` [RFC PATCH v1 03/10] mm: truncate: Expose preparation steps for truncate_inode_pages_final() Ackerley Tng
2026-02-23  7:04 ` [RFC PATCH v1 04/10] KVM: guest_memfd: Implement evict_inode for guest_memfd Ackerley Tng
2026-02-23  7:04 ` [RFC PATCH v1 05/10] mm: Export unmap_mapping_folio() for KVM Ackerley Tng
2026-02-23  7:04 ` [RFC PATCH v1 06/10] mm: filemap: Export filemap_remove_folio() Ackerley Tng
2026-02-23  7:04 ` [RFC PATCH v1 07/10] KVM: guest_memfd: Implement custom truncation function Ackerley Tng
2026-02-23  7:04 ` [RFC PATCH v1 08/10] KVM: guest_memfd: Track amount of memory allocated on inode Ackerley Tng
2026-02-23  7:04 ` [RFC PATCH v1 09/10] KVM: selftests: Wrap fstat() to assert success Ackerley Tng
2026-02-23  7:04 ` [RFC PATCH v1 10/10] KVM: selftests: Test that st_blocks is updated on allocation Ackerley Tng
2026-02-23 15:23 ` [RFC PATCH v1 00/10] guest_memfd: Track amount of memory allocated on inode David Hildenbrand (Arm)
2026-02-23 23:42   ` Ackerley Tng
2026-02-24 15:26     ` David Hildenbrand (Arm) [this message]
2026-02-24 23:08       ` Ackerley Tng
2026-02-25  7:31         ` Ackerley Tng
2026-02-25  9:21           ` David Hildenbrand (Arm)
2026-02-26  7:18             ` Ackerley Tng

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:7c753148af8 dfblob:94f8bb81f01 dfblob:8b3dd145b25
dfblob:f7f6930977a dfblob:6cd7974d4ad dfblob:5a810eaacab )
 OR (
bs:"Re: [RFC PATCH v1 00/10] guest_memfd: Track amount of memory allocated on inode" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ef9a0bd-4cff-4518-b7fb-e65c9b761a5a@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=fvdl@google.com \
    --cc=jthoughton@google.com \
    --cc=kalyazin@amazon.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=michael.roth@amd.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pbonzini@redhat.com \
    --cc=pratyush@kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=shivankg@amd.com \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.