Re: [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	Keir Fraser <keir@xen.org>, Wei Liu <wei.liu2@citrix.com>
Subject: Re: [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()
Date: Wed, 12 Aug 2015 18:24:19 +0100	[thread overview]
Message-ID: <55CB8143.9090402@citrix.com> (raw)
In-Reply-To: <55CB81AE020000780009A393@prv-mh.provo.novell.com>

On 12/08/15 16:26, Jan Beulich wrote:
>>>> On 12.08.15 at 17:13, <andrew.cooper3@citrix.com> wrote:
>> On 12/08/15 15:19, Jan Beulich wrote:
>>> +    if ( writable && *writable )
>>> +    {
>>> +        struct hvm_write_map *track = xmalloc(struct hvm_write_map);
>>> +
>>> +        if ( !track )
>>> +        {
>>> +            put_page(page);
>>> +            return NULL;
>>> +        }
>>> +        track->page = page;
>>> +        spin_lock(&d->arch.hvm_domain.write_map.lock);
>>> +        list_add_tail(&track->list, &d->arch.hvm_domain.write_map.list);
>> Map and unmap of non-permenant pages will happen in a stacked fashion,
>> so putting non-persistent pages on the head of the list will be more
>> efficient when walking the list for removal.
> But this is dealing with permanent mappings.

So it does.  Sorry for the noise.

>
>>> +void hvm_mapped_guest_frames_mark_dirty(struct domain *d)
>>> +{
>>> +    struct hvm_write_map *track;
>>>  
>>> -    put_page(mfn_to_page(mfn));
>>> +    spin_lock(&d->arch.hvm_domain.write_map.lock);
>>> +    list_for_each_entry(track, &d->arch.hvm_domain.write_map.list, list)
>>> +        paging_mark_dirty(d, page_to_mfn(track->page));
>> This is potentially a long running operation.  It might be easier to
>> ASSERT() an upper limit of mapped pages than to make this restartable.
> I don't think I can come up with a sensible upper limit - do you have a
> good suggestion (including a rationale)? Also I don't view this as an
> immediate problem - as long as we're limiting HVM guests to 128 vCPU-s
> we're not going to see many more than 128 such mappings, and even
> those only from nested HVM code. (So to answer my question - 256
> would seem a reasonable limit for now.)
>
>>> --- a/xen/arch/x86/mm/paging.c
>>> +++ b/xen/arch/x86/mm/paging.c
>>> @@ -29,6 +29,7 @@
>>>  #include <asm/hvm/nestedhvm.h>
>>>  #include <xen/numa.h>
>>>  #include <xsm/xsm.h>
>>> +#include <public/sched.h> /* SHUTDOWN_suspend */
>>>  
>>>  #include "mm-locks.h"
>>>  
>>> @@ -422,6 +423,14 @@ static int paging_log_dirty_op(struct do
>>>  
>>>      if ( !resuming )
>>>      {
>>> +        /*
>>> +         * Mark dirty all currently write-mapped pages on the final iteration
>>> +         * of a HVM save operation.
>>> +         */
>>> +        if ( has_hvm_container_domain(d) && d->is_shut_down &&
>>> +             d->shutdown_code == SHUTDOWN_suspend )
>>> +            hvm_mapped_guest_frames_mark_dirty(d);
>> I am not sure whether this is useful.  There are situations such as
>> memory snapshot where it is insufficient, as the domain doesn't suspend.
> Perhaps the condition could be refined then? And then - isn't
> memory snapshot what Remus does (which I checked does a
> suspend in the tool stack)?

XenServer live memory snapshots of windows VMs uses a windows API to
quiesce IO, pauses the domain, performs a non-live save, then unpauses
the domain.

Such an action would want the these bits in the bitmap, but would not
match those conditions.

>
>> It would probably be better to hook this off a specific request from the
>> toolstack, as the migration code is in a far better position to know
>> whether this information is needed or can be deferred to the next iteration.
> Which next iteration (when we're talking about the final one)?
>
> I considered tool stack involvement, but would like to go that
> route only if unavoidable.

It is wrong for Xen to guess.  The toolstack should explicitly ask for
them when it wants them.

I have just written my slides for Seattle, which cover some of the
outstanding issues with regards to guests and logdirty.  As migration
with nested virt doesn't function at all, fixing these entries in the
logdirty bitmap is not urgent if you don't fancy extending/tweaking
xen_domctl_shadow_op.

~Andrew

next prev parent reply	other threads:[~2015-08-12 17:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-12 14:09 [PATCH v2 0/2] x86/HVM: hvm_map_guest_frame_rw() adjustments Jan Beulich
2015-08-12 14:16 ` [PATCH v2 1/2] " Jan Beulich
2015-08-12 15:44   ` Andrew Cooper
2015-08-13  9:22   ` Wei Liu
2015-08-13  9:33     ` Jan Beulich
2015-08-13  9:35       ` Wei Liu
2015-08-13 10:09         ` Jan Beulich
2015-08-14  7:28   ` Tian, Kevin
2015-08-12 14:19 ` [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw() Jan Beulich
2015-08-12 15:13   ` Andrew Cooper
2015-08-12 15:26     ` Jan Beulich
2015-08-12 17:24       ` Andrew Cooper [this message]
2015-08-13  6:32         ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55CB8143.9090402@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=keir@xen.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.