All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Anatoly Burakov <burakov.anatoly@gmail.com>
Cc: kvm@vger.kernel.org
Subject: Re: mmapping physical memory
Date: Mon, 26 Aug 2013 16:19:35 +0200	[thread overview]
Message-ID: <20130826141935.GE25814@redhat.com> (raw)
In-Reply-To: <CAOBBzRxKREEVF3LnPUh4Rr+GbrhhMofenHJ+dvjmAe-C_rghag@mail.gmail.com>

Hi Anatoly,

On Mon, Aug 26, 2013 at 12:58:25PM +0100, Anatoly Burakov wrote:
> Hi all
> 
> I am using IVSHMEM to mmap /dev/mem into guest. The mmap works fine on
> QEMU without KVM support enabled, but with KVM i get kernel errors:
> 
> ***************************** (with EPT enabled)
> 
> [  746.940720] ------------[ cut here ]------------
> [  746.948612] kernel BUG at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1257!

So the problem is KVM cannot do put_page on a pfn coming from a
/dev/mem mapping, but it cannot handle VM_PFNMAP mappings without
PageReserved set. During kvm_release_page_* KVM only has the pfn
number of the page, and it has to decide if this page is refcounted or
not, solely based on the pfn number. So if the page is not set as
referenced it cannot allow a mapping to be established, or later
during spte teardown put_page would run on the /dev/mem memory leading
to memory corruption. The above BUG_ON isn't just a false positive,
but it shows a limitation in the KVM page fault ability to map
any kind of memory coming from the host (including /dev/mem mappings).

So I'm suggesting to drop FOLL_GET in the page fault and
kvm_release_page_* after the spte establishment, and to relay entirely
on the mmu notifier and the kvm_mmu lock by adding a
vcpu->in_progress_fault_addr to set before calling gup hva_to_pfn and
to clear in the mmu notifier code within kvm->mmu_lock and to check
within the kvm->mmu_lock during spte establishment to know if the page
pointer become stale and we shall bail out and repeat the fault or not.

We'll still need to use FOLL_GET and set_page_dirty in some cases,
like after modifying the page in places like
emulator_cmpxchg_emulated. Those places cannot depend on the mmu
notifier and the dirty bit set in the pte isn't enough because the
page can be swapped out to disk and marked clean before kmap_atomic
runs, but the 99% of the hva_to_pfn are coming from the KVM secondary
MMU page faults, they're protected by the mmu notifier and they can
skip the refcounting completely including FOLL_GET. And then because
we won't have to run put_page at all anymore, the above BUG will
disappear too.

In terms of performance, I estimate the only cons will be a
"ATOMIC_ONCE(vcpu->in_progress_fault_addr) = addr" per-thread
cacheline local and lockless initialization before calling gup in
hva_to_pfn and the pros will be the removal of all refcounting
atomic_inc/dec and set_page_dirty from all the KVM page faults.

      reply	other threads:[~2013-08-26 14:19 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-26 11:58 mmapping physical memory Anatoly Burakov
2013-08-26 14:19 ` Andrea Arcangeli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130826141935.GE25814@redhat.com \
    --to=aarcange@redhat.com \
    --cc=burakov.anatoly@gmail.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.