All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olaf Hering <olaf@aepfle.de>
To: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Cc: andres@gridcentric.ca, xen-devel@lists.xensource.com,
	tim@xen.org, adin@gridcentric.ca
Subject: Re: [PATCH 1 of 8] x86/mm: Fix paging_load
Date: Thu, 26 Jan 2012 13:43:04 +0100	[thread overview]
Message-ID: <20120126124304.GA3355@aepfle.de> (raw)
In-Reply-To: <251543dc172738014ff5bd13c5614c3d.squirrel@webmail.lagarcavilla.org>

On Thu, Jan 26, Andres Lagar-Cavilla wrote:

> > On Thu, Jan 26, Andres Lagar-Cavilla wrote:
> >
> >> Now, afaict, the p2m_ram_paging_in state is not needed anymore. Can you
> >> provide feedback as to whether
> >> 1. remove p2m_ram_paging_in
> >> 2. rename p2m_ram_paging_in_start to p2m_ram_paging_in
> >>
> >> sounds like a good plan?
> >
> > In my opinion the common case is that evicted pages get populated, an
> > request is sent. Later an response is expected to make room in the ring.
> >
> > If p2m_mem_paging_populate allocates a page for the guest, it can let
> > the pager know that it did so (or failed to allocate one).
> > If there is a page already, the pager can copy the gfn content into a
> > buffer, put a pointer to it in the response and let
> > p2m_mem_paging_resume() handle both the ring accounting (as it does now)
> > and also the copy_from_user.
> 
> So, this would bounce the page contents twice for the case when the page
> hasn't yet been evicted?

If it was not evicted, then nothing has to be done. In theory the page
could be resumed right away. But you wanted the notification, so a
kind-of dummy request has to be sent. The pager itself will figure out
quickly that the page was not evicted yet, so all it can do is to send a
dummy response.
I'm not sure what you mean with bouncing it twice. The pager itself has
likely written the page, so some IO happend. The hypervisor will find a
mfn for the gfn, right now it does nothing but doing p2mt changes.

> > If page allocation failed, the pager has to allocate one via
> > p2m_mem_paging_prep() as it is done now, as an intermediate step.
> 
> The issue of failed allocations is more pervasive. It also affects mem
> sharing. And even PoD. What I'm trying to say is that even though your
> solution seems to work (as long as the pager does dom0 ballooning to free
> up some memory in between populate and prep!), we need a more generic
> mechanism. Something along the lines of an ENOMEM mem_event ring, and a
> matching dom0 daemon.

I'm not sure how all that relates to putting an alloc_domheap_page()
into p2m_mem_paging_populate() and let the pager know about the results.

> >
> > The buffer page handling in the pager is probably simple, it needs to
> > maintain RING_SIZE() buffers. There cant be more than that in flight
> > because thats the limit of requests as well. In other words, the pager
> > does not need to wait for p2m_mem_paging_resume() to run and pull the
> > buffer content.
> >
> >
> > If the "populate - allocate - put_request - get_request - fill_buffer -
> > put_response - resume  get_response - copy_from_buffer - resume_vcpu"
> > cycle works, it would reduce the overall amount of work to be done
> > during paging, even if the hypercalls itself are not the bottleneck.
> > It all depends on the possibility to allocate a page in the various
> > contexts where p2m_mem_paging_populate is called.
> 
> The gist here is that the paging_load call would be removed?

No, it is required if alloc_domheap_page() in
p2m_mem_paging_populate() fails. Then the pager has to take the "slow
path" and try until it gets a page, like it does now.

> I like the general direction, but one excellent property of paging_resume
> is that it doesn't fail. This is particularly important since we already
> do resumes via ring responses and event channel kicks (see below). So, if
> resume needs to propagate failures back to the pager, things get icky.
> 
> (paging_load is restartable, see other email)

Once the request is sent, all what can happen is that the copy_from_user
fails. And in that case the guest is in an undefined state. Perhaps such
copy_from_user handling can be fixed to be reliable. In that case of
course my idea wont work.

> > The resume part could be done via eventchannel and
> > XEN_DOMCTL_MEM_EVENT_OP_PAGING_RESUME could be removed.
> 
> This is already the case. I'm not eager to remove the domctl, but resuming
> via event channel kick only is in place.

It looks to me like resume is currently called twice in xenpaging, once
per ioctl and once per event channel.

> > Also the question is if freeing one p2mt is more important than reducing
> > the number if hypercalls to execute at runtime.
> 
> Agreed. However, eliminating code complexity is also useful, and these two
> ram_paging_in states cause everyone headaches.

Well, its not so alien to me after starring at and tracing it long
enough.

Olaf

  reply	other threads:[~2012-01-26 12:43 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-26  3:53 [PATCH 0 of 8] x86/mm fixes Andres Lagar-Cavilla
2012-01-26  3:53 ` [PATCH 1 of 8] x86/mm: Fix paging_load Andres Lagar-Cavilla
2012-01-26  9:46   ` Olaf Hering
2012-01-26 10:49     ` Andres Lagar-Cavilla
2012-01-26 12:05       ` Olaf Hering
2012-01-26 12:23         ` Andres Lagar-Cavilla
2012-01-26 12:43           ` Olaf Hering [this message]
2012-01-26  3:53 ` [PATCH 2 of 8] x86/mm: Fix p2m teardown locking Andres Lagar-Cavilla
2012-01-26  3:53 ` [PATCH 3 of 8] x86/mm: Allow foreign read-only mappings of shared pages Andres Lagar-Cavilla
2012-01-26  3:53 ` [PATCH 4 of 8] x86/mm: Output domain count of paged pages in console Andres Lagar-Cavilla
2012-01-26  9:47   ` Olaf Hering
2012-01-26  3:53 ` [PATCH 5 of 8] x86/mm: Remove stale variable from debugtrace printk in p2m audit Andres Lagar-Cavilla
2012-01-26  3:53 ` [PATCH 6 of 8] x86/mm: Properly account for paged out pages Andres Lagar-Cavilla
2012-01-26  9:54   ` Olaf Hering
2012-01-26 10:47     ` Andres Lagar-Cavilla
2012-01-26 12:11       ` Olaf Hering
2012-01-26 12:26         ` Andres Lagar-Cavilla
2012-01-26 13:08           ` Olaf Hering
2012-01-26  3:53 ` [PATCH 7 of 8] x86/mm: clean use of p2m unlocked queries Andres Lagar-Cavilla
2012-01-26  3:53 ` [PATCH 8 of 8] x86/mm: Avoid spurious deadlock panic trigger Andres Lagar-Cavilla
2012-01-26 13:31 ` [PATCH 0 of 8] x86/mm fixes Tim Deegan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120126124304.GA3355@aepfle.de \
    --to=olaf@aepfle.de \
    --cc=adin@gridcentric.ca \
    --cc=andres@gridcentric.ca \
    --cc=andres@lagarcavilla.org \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.