All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: kvm-ppc@vger.kernel.org
Subject: Re: [PATCH v5 5/5] KVM: PPC: e500: MMU API
Date: Tue, 19 Jul 2011 11:20:33 +0000	[thread overview]
Message-ID: <20110719112033.GA19571@cmpxchg.org> (raw)
In-Reply-To: <20110707234159.GE6646@schlenkerla.am.freescale.net>

On Tue, Jul 19, 2011 at 10:51:40AM +0200, Alexander Graf wrote:
> 
> On 19.07.2011, at 10:36, Johannes Weiner wrote:
> 
> > On Mon, Jul 18, 2011 at 11:44:02PM +0200, Alexander Graf wrote:
> >> 
> >> On 18.07.2011, at 20:08, Scott Wood wrote:
> >> 
> >>> On Mon, 18 Jul 2011 18:33:58 +0200
> >>> Alexander Graf <agraf@suse.de> wrote:
> >>> 
> >>>> 
> >>>> On 18.07.2011, at 18:18, Scott Wood wrote:
> >>>> 
> >>>>> They're pinned by get_user_pages_fast().  We (potentially) write to them, so
> >>>>> we should mark them dirty, because they are dirty.  It's up to the rest
> >>>>> of Linux what to do with that.  Will being pinned stop updates from being
> >>>>> written out if it is file-backed?  And eventually the vm will be destroyed
> >>>>> (or the tlb reconfigured) and the pages will be unpinned.
> >>>> 
> >>>> Hrm. How much overhead do we add to the exit-to-userspace path with this?
> >>> 
> >>> Not sure -- probably not too much for anonymous memory, compared to the
> >>> rest of the cost of a heavyweight exit.  On e500 the tlb array is 4 pages,
> >>> and each set_page_dirty_lock() will just do a few bit operations.
> >> 
> >> Hm, ok.
> >> 
> >>> 
> >>>> I completely agree that we should mark them dirty when closing, but I'm
> >>>> not fully convinced a "we dirty them so we should declare them dirty at
> >>>> random times" pays off against possible substantial slowdowns due to the
> >>>> marking. Keep in mind that this is the MMIO case which isn't _that_ seldom.
> >>> 
> >>> If we can convince ourselves nothing bad can happen, fine.  I did it here
> >>> because this is the point at which the API says the contents of the memory
> >>> are well-defined.  If it is file-backed, and userspace does a sync on a
> >>> heavyweight exit, shouldn't the the right thing get written to disk?  Could
> >>> any other weird things happen?  I'm not familiar enough with that part of
> >>> the kernel to say right away that it's safe.
> >> 
> >> I'm neither, these are pretty subtile grounds. CC'ing Andrea and
> >> Johannes. Guys, would you please take a look at that patch and tell
> >> us if it's safe and a good thing to do what's being done here?
> >> 
> >> We're talking about the following patch: http://www.spinics.net/lists/kvm-ppc/msg02961.html
> >> and specifically about:
> >> 
> >>> +void kvmppc_core_heavy_exit(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
> >>> +	int i;
> >>> +
> >>> +	/*
> >>> +	 * We may have modified the guest TLB, so mark it dirty.
> >>> +	 * We only do it on an actual return to userspace, to avoid
> >>> +	 * adding more overhead to getting scheduled out -- and avoid
> >>> +	 * any locking issues with getting preempted in the middle of
> >>> +	 * KVM_CONFIG_TLB, etc.
> >>> +	 */
> >>> +
> >>> +	for (i = 0; i < vcpu_e500->num_shared_tlb_pages; i++)
> >>> +		set_page_dirty_lock(vcpu_e500->shared_tlb_pages[i]);
> >>> }
> >> 
> >> The background is that we want to have a few shared pages between
> >> kernel and user space to keep guest TLB information in. It's too
> >> much to shove around every time we need it in user space.
> > 
> > Is there a strict requirement to have these pages originate from
> > userspace?  Usually, shared memory between kernel and userspace is
> > owned by the driver and kept away from the mm subsystem completely.
> > 
> > You could allocate the memory in the driver when userspace issues an
> > ioctl like KVM_ALLOC_TLB_CONFIG and return a file handle that can be
> > mmap'd.  The array length info is maintained in the vma for the
> > kernel, userspace must remember the size of mmap regions anyway.
> 
> Hrm. What's the advantage of doing it this way around vs the other?

You don't have to work around the mm subsystem trying to reclaim your
memory, maintain disk coherency that is guaranteed by the filebacked
memory semantics etc.

If your driver provides the memory, there are much less assumptions
from userspace that you have to consider and memory management will
not interfere either.

  parent reply	other threads:[~2011-07-19 11:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-07 23:41 [PATCH v5 5/5] KVM: PPC: e500: MMU API Scott Wood
2011-07-08 12:57 ` Alexander Graf
2011-07-18 10:09 ` Alexander Graf
2011-07-18 16:18 ` Scott Wood
2011-07-18 16:33 ` Alexander Graf
2011-07-18 18:08 ` Scott Wood
2011-07-18 21:44 ` Alexander Graf
2011-07-19  8:36 ` Johannes Weiner
2011-07-19  8:51 ` Alexander Graf
2011-07-19 11:20 ` Johannes Weiner [this message]
2011-07-24  9:16 ` Alexander Graf
2011-07-25 19:25 ` Scott Wood
2011-07-25 21:50 ` Alexander Graf
2011-08-08  8:49 ` Johannes Weiner
2011-08-08 23:13 ` Scott Wood
2011-08-13 15:14 ` Benjamin Herrenschmidt
2011-08-15 15:03 ` Scott Wood
2011-08-15 15:15 ` Benjamin Herrenschmidt
2011-08-15 20:55 ` Scott Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110719112033.GA19571@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=kvm-ppc@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.