From: Chao Peng <chao.p.peng@linux.intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
"J . Bruce Fields" <bfields@fieldses.org>,
Andrew Morton <akpm@linux-foundation.org>,
Yu Zhang <yu.c.zhang@linux.intel.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
luto@kernel.org, john.ji@intel.com, susie.li@intel.com,
jun.nakajima@intel.com, dave.hansen@intel.com,
ak@linux.intel.com, david@redhat.com
Subject: Re: [PATCH v3 kvm/queue 05/16] KVM: Maintain ofs_tree for fast memslot lookup by file offset
Date: Fri, 31 Dec 2021 10:26:36 +0800 [thread overview]
Message-ID: <20211231022636.GA7025@chaop.bj.intel.com> (raw)
In-Reply-To: <YcuGGCo5pR31GkZE@google.com>
On Tue, Dec 28, 2021 at 09:48:08PM +0000, Sean Christopherson wrote:
> On Fri, Dec 24, 2021, Chao Peng wrote:
> > On Thu, Dec 23, 2021 at 06:02:33PM +0000, Sean Christopherson wrote:
> > > On Thu, Dec 23, 2021, Chao Peng wrote:
> > >
> > > In other words, there needs to be a 1:1 gfn:file+offset mapping. Since userspace
> > > likely wants to allocate a single file for guest private memory and map it into
> > > multiple discontiguous slots, e.g. to skip the PCI hole, the best idea off the top
> > > of my head would be to register the notifier on a per-slot basis, not a per-VM
> > > basis. It would require a 'struct kvm *' in 'struct kvm_memory_slot', but that's
> > > not a huge deal.
> > >
> > > That way, KVM's notifier callback already knows the memslot and can compute overlap
> > > between the memslot and the range by reversing the math done by kvm_memfd_get_pfn().
> > > Then, armed with the gfn and slot, invalidation is just a matter of constructing
> > > a struct kvm_gfn_range and invoking kvm_unmap_gfn_range().
> >
> > KVM is easy but the kernel bits would be difficulty, it has to maintain
> > fd+offset to memslot mapping because one fd can have multiple memslots,
> > it need decide which memslot needs to be notified.
>
> No, the kernel side maintains an opaque pointer like it does today,
But the opaque pointer will now become memslot, isn't it? That said,
kernel side should maintain a list of opaque pointer (memslot) instead
of one for each fd (inode) since a fd to memslot mapping is 1:M now.
>KVM handles
> reverse engineering the memslot to get the offset and whatever else it needs.
> notify_fallocate() and other callbacks are unchanged, though they probably can
> drop the inode.
>
> E.g. likely with bad math and handwaving on the overlap detection:
>
> int kvm_private_fd_fallocate_range(void *owner, pgoff_t start, pgoff_t end)
> {
> struct kvm_memory_slot *slot = owner;
> struct kvm_gfn_range gfn_range = {
> .slot = slot,
> .start = (start - slot->private_offset) >> PAGE_SHIFT,
> .end = (end - slot->private_offset) >> PAGE_SHIFT,
> .may_block = true,
> };
>
> if (!has_overlap(slot, start, end))
> return 0;
>
> gfn_range.end = min(gfn_range.end, slot->base_gfn + slot->npages);
>
> kvm_unmap_gfn_range(slot->kvm, &gfn_range);
> return 0;
> }
I understand this KVM side handling, but again one fd can have multiple
memslots. How shmem decides to notify which memslot from a list of
memslots when it invokes the notify_fallocate()? Or just notify all
the possible memslots then let KVM to check?
Thanks,
Chao
WARNING: multiple messages have this Message-ID (diff)
From: Chao Peng <chao.p.peng@linux.intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Wanpeng Li <wanpengli@tencent.com>,
jun.nakajima@intel.com, kvm@vger.kernel.org, david@redhat.com,
qemu-devel@nongnu.org, "J . Bruce Fields" <bfields@fieldses.org>,
linux-mm@kvack.org, "H . Peter Anvin" <hpa@zytor.com>,
ak@linux.intel.com, Jonathan Corbet <corbet@lwn.net>,
Joerg Roedel <joro@8bytes.org>,
x86@kernel.org, Hugh Dickins <hughd@google.com>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
luto@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Jim Mattson <jmattson@google.com>,
dave.hansen@intel.com, susie.li@intel.com,
Jeff Layton <jlayton@kernel.org>,
linux-kernel@vger.kernel.org, john.ji@intel.com,
Yu Zhang <yu.c.zhang@linux.intel.com>,
linux-fsdevel@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v3 kvm/queue 05/16] KVM: Maintain ofs_tree for fast memslot lookup by file offset
Date: Fri, 31 Dec 2021 10:26:36 +0800 [thread overview]
Message-ID: <20211231022636.GA7025@chaop.bj.intel.com> (raw)
In-Reply-To: <YcuGGCo5pR31GkZE@google.com>
On Tue, Dec 28, 2021 at 09:48:08PM +0000, Sean Christopherson wrote:
> On Fri, Dec 24, 2021, Chao Peng wrote:
> > On Thu, Dec 23, 2021 at 06:02:33PM +0000, Sean Christopherson wrote:
> > > On Thu, Dec 23, 2021, Chao Peng wrote:
> > >
> > > In other words, there needs to be a 1:1 gfn:file+offset mapping. Since userspace
> > > likely wants to allocate a single file for guest private memory and map it into
> > > multiple discontiguous slots, e.g. to skip the PCI hole, the best idea off the top
> > > of my head would be to register the notifier on a per-slot basis, not a per-VM
> > > basis. It would require a 'struct kvm *' in 'struct kvm_memory_slot', but that's
> > > not a huge deal.
> > >
> > > That way, KVM's notifier callback already knows the memslot and can compute overlap
> > > between the memslot and the range by reversing the math done by kvm_memfd_get_pfn().
> > > Then, armed with the gfn and slot, invalidation is just a matter of constructing
> > > a struct kvm_gfn_range and invoking kvm_unmap_gfn_range().
> >
> > KVM is easy but the kernel bits would be difficulty, it has to maintain
> > fd+offset to memslot mapping because one fd can have multiple memslots,
> > it need decide which memslot needs to be notified.
>
> No, the kernel side maintains an opaque pointer like it does today,
But the opaque pointer will now become memslot, isn't it? That said,
kernel side should maintain a list of opaque pointer (memslot) instead
of one for each fd (inode) since a fd to memslot mapping is 1:M now.
>KVM handles
> reverse engineering the memslot to get the offset and whatever else it needs.
> notify_fallocate() and other callbacks are unchanged, though they probably can
> drop the inode.
>
> E.g. likely with bad math and handwaving on the overlap detection:
>
> int kvm_private_fd_fallocate_range(void *owner, pgoff_t start, pgoff_t end)
> {
> struct kvm_memory_slot *slot = owner;
> struct kvm_gfn_range gfn_range = {
> .slot = slot,
> .start = (start - slot->private_offset) >> PAGE_SHIFT,
> .end = (end - slot->private_offset) >> PAGE_SHIFT,
> .may_block = true,
> };
>
> if (!has_overlap(slot, start, end))
> return 0;
>
> gfn_range.end = min(gfn_range.end, slot->base_gfn + slot->npages);
>
> kvm_unmap_gfn_range(slot->kvm, &gfn_range);
> return 0;
> }
I understand this KVM side handling, but again one fd can have multiple
memslots. How shmem decides to notify which memslot from a list of
memslots when it invokes the notify_fallocate()? Or just notify all
the possible memslots then let KVM to check?
Thanks,
Chao
next prev parent reply other threads:[~2021-12-31 2:27 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-23 12:29 [PATCH v3 kvm/queue 00/16] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2021-12-23 12:29 ` Chao Peng
2021-12-23 12:29 ` [PATCH v3 kvm/queue 01/16] mm/shmem: Introduce F_SEAL_INACCESSIBLE Chao Peng
2021-12-23 12:29 ` Chao Peng
2022-01-04 14:22 ` David Hildenbrand
2022-01-04 14:22 ` David Hildenbrand
2022-01-06 13:06 ` Chao Peng
2022-01-06 13:06 ` Chao Peng
2022-01-13 15:56 ` David Hildenbrand
2022-01-13 15:56 ` David Hildenbrand
2021-12-23 12:29 ` [PATCH v3 kvm/queue 02/16] mm/memfd: Introduce MFD_INACCESSIBLE flag Chao Peng
2021-12-23 12:29 ` Chao Peng
2021-12-23 12:29 ` [PATCH v3 kvm/queue 03/16] mm/memfd: Introduce MEMFD_OPS Chao Peng
2021-12-23 12:29 ` Chao Peng
2021-12-24 3:53 ` Robert Hoo
2021-12-24 3:53 ` Robert Hoo
2021-12-31 2:38 ` Chao Peng
2021-12-31 2:38 ` Chao Peng
2022-01-04 17:38 ` Sean Christopherson
2022-01-05 6:07 ` Chao Peng
2022-01-05 6:07 ` Chao Peng
2021-12-23 12:29 ` [PATCH v3 kvm/queue 04/16] KVM: Extend the memslot to support fd-based private memory Chao Peng
2021-12-23 12:29 ` Chao Peng
2021-12-23 17:35 ` Sean Christopherson
2021-12-31 2:53 ` Chao Peng
2021-12-31 2:53 ` Chao Peng
2022-01-04 17:34 ` Sean Christopherson
2021-12-23 12:30 ` [PATCH v3 kvm/queue 05/16] KVM: Maintain ofs_tree for fast memslot lookup by file offset Chao Peng
2021-12-23 12:30 ` Chao Peng
2021-12-23 18:02 ` Sean Christopherson
2021-12-24 3:54 ` Chao Peng
2021-12-24 3:54 ` Chao Peng
2021-12-27 23:50 ` Yao Yuan
2021-12-27 23:50 ` Yao Yuan
2021-12-28 21:48 ` Sean Christopherson
2021-12-31 2:26 ` Chao Peng [this message]
2021-12-31 2:26 ` Chao Peng
2022-01-04 17:43 ` Sean Christopherson
2022-01-05 6:09 ` Chao Peng
2022-01-05 6:09 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 06/16] KVM: Implement fd-based memory using MEMFD_OPS interfaces Chao Peng
2021-12-23 12:30 ` Chao Peng
2021-12-23 18:34 ` Sean Christopherson
2021-12-23 23:09 ` Paolo Bonzini
2021-12-23 23:09 ` Paolo Bonzini
2021-12-24 4:25 ` Chao Peng
2021-12-24 4:25 ` Chao Peng
2021-12-28 22:14 ` Sean Christopherson
2021-12-24 4:12 ` Chao Peng
2021-12-24 4:12 ` Chao Peng
2021-12-24 4:22 ` Chao Peng
2021-12-24 4:22 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 07/16] KVM: Refactor hva based memory invalidation code Chao Peng
2021-12-23 12:30 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 08/16] KVM: Special handling for fd-based memory invalidation Chao Peng
2021-12-23 12:30 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 09/16] KVM: Split out common memory invalidation code Chao Peng
2021-12-23 12:30 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 10/16] KVM: Implement fd-based memory invalidation Chao Peng
2021-12-23 12:30 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 11/16] KVM: Add kvm_map_gfn_range Chao Peng
2021-12-23 12:30 ` Chao Peng
2021-12-23 18:06 ` Sean Christopherson
2021-12-24 4:13 ` Chao Peng
2021-12-24 4:13 ` Chao Peng
2021-12-31 2:33 ` Chao Peng
2021-12-31 2:33 ` Chao Peng
2022-01-04 17:31 ` Sean Christopherson
2022-01-05 6:14 ` Chao Peng
2022-01-05 6:14 ` Chao Peng
2022-01-05 17:03 ` Sean Christopherson
2022-01-06 12:35 ` Chao Peng
2022-01-06 12:35 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 12/16] KVM: Implement fd-based memory fallocation Chao Peng
2021-12-23 12:30 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 13/16] KVM: Add KVM_EXIT_MEMORY_ERROR exit Chao Peng
2021-12-23 12:30 ` Chao Peng
2021-12-23 18:28 ` Sean Christopherson
2021-12-23 12:30 ` [PATCH v3 kvm/queue 14/16] KVM: Handle page fault for private memory Chao Peng
2021-12-23 12:30 ` Chao Peng
2022-01-04 1:46 ` Yan Zhao
2022-01-04 1:46 ` Yan Zhao
2022-01-04 9:10 ` Chao Peng
2022-01-04 9:10 ` Chao Peng
2022-01-04 10:06 ` Yan Zhao
2022-01-04 10:06 ` Yan Zhao
2022-01-05 6:28 ` Chao Peng
2022-01-05 6:28 ` Chao Peng
2022-01-05 7:53 ` Yan Zhao
2022-01-05 7:53 ` Yan Zhao
2022-01-05 20:52 ` Sean Christopherson
2022-01-14 5:53 ` Yan Zhao
2022-01-14 5:53 ` Yan Zhao
2021-12-23 12:30 ` [PATCH v3 kvm/queue 15/16] KVM: Use kvm_userspace_memory_region_ext Chao Peng
2021-12-23 12:30 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 16/16] KVM: Register/unregister private memory slot to memfd Chao Peng
2021-12-23 12:30 ` Chao Peng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211231022636.GA7025@chaop.bj.intel.com \
--to=chao.p.peng@linux.intel.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=bfields@fieldses.org \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jlayton@kernel.org \
--cc=jmattson@google.com \
--cc=john.ji@intel.com \
--cc=joro@8bytes.org \
--cc=jun.nakajima@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=seanjc@google.com \
--cc=susie.li@intel.com \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=x86@kernel.org \
--cc=yu.c.zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.