All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: Tamas K Lengyel <tamas@tklengyel.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>,
	Henry Wang <Henry.Wang@arm.com>
Subject: Re: [PATCH v4 5/9] x86/mem-sharing: copy GADDR based shared guest areas
Date: Thu, 28 Sep 2023 13:07:52 +0200	[thread overview]
Message-ID: <ZRVeiAFlyf1LJ2qR@MacBookPdeRoger> (raw)
In-Reply-To: <d285a456-307a-0a36-0910-cb80f419627d@suse.com>

On Thu, Sep 28, 2023 at 12:11:02PM +0200, Jan Beulich wrote:
> On 28.09.2023 11:51, Roger Pau Monné wrote:
> > On Thu, Sep 28, 2023 at 09:16:20AM +0200, Jan Beulich wrote:
> >> --- a/xen/arch/x86/mm/mem_sharing.c
> >> +++ b/xen/arch/x86/mm/mem_sharing.c
> >> @@ -1641,6 +1641,68 @@ static void copy_vcpu_nonreg_state(struc
> >>      hvm_set_nonreg_state(cd_vcpu, &nrs);
> >>  }
> >>  
> >> +static int copy_guest_area(struct guest_area *cd_area,
> >> +                           const struct guest_area *d_area,
> >> +                           struct vcpu *cd_vcpu,
> >> +                           const struct domain *d)
> >> +{
> >> +    mfn_t d_mfn, cd_mfn;
> >> +
> >> +    if ( !d_area->pg )
> >> +        return 0;
> >> +
> >> +    d_mfn = page_to_mfn(d_area->pg);
> >> +
> >> +    /* Allocate & map a page for the area if it hasn't been already. */
> >> +    if ( !cd_area->pg )
> >> +    {
> >> +        gfn_t gfn = mfn_to_gfn(d, d_mfn);
> >> +        struct p2m_domain *p2m = p2m_get_hostp2m(cd_vcpu->domain);
> >> +        p2m_type_t p2mt;
> >> +        p2m_access_t p2ma;
> >> +        unsigned int offset;
> >> +        int ret;
> >> +
> >> +        cd_mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL, NULL);
> >> +        if ( mfn_eq(cd_mfn, INVALID_MFN) )
> >> +        {
> >> +            struct page_info *pg = alloc_domheap_page(cd_vcpu->domain, 0);
> >> +
> >> +            if ( !pg )
> >> +                return -ENOMEM;
> >> +
> >> +            cd_mfn = page_to_mfn(pg);
> >> +            set_gpfn_from_mfn(mfn_x(cd_mfn), gfn_x(gfn));
> >> +
> >> +            ret = p2m->set_entry(p2m, gfn, cd_mfn, PAGE_ORDER_4K, p2m_ram_rw,
> >> +                                 p2m->default_access, -1);
> >> +            if ( ret )
> >> +                return ret;
> >> +        }
> > 
> > I'm still unsure why map_guest_area() shouldn't be able to deal with a
> > forked child needing the page to be mapped.  What happens when a
> > forked child executes the hypercall to map such areas against not yet
> > populated gfns?
> > 
> > Shouldn't map_guest_area() be capable of handling those calls and
> > populating on demand?
> 
> Funny you should use this wording: P2M_ALLOC will deal with populating
> PoD slots, yes, but aiui forks don't fill their p2m with all PoD slots,
> but rather leave them empty. Yet again I need to leave it to Tamas to
> confirm or prove me wrong.

If the child p2m populating is only triggered by guest accesses then a
lot of hypercalls are likely to not work correctly on childs.

> 
> >> +        else if ( p2mt != p2m_ram_rw )
> >> +            return -EBUSY;
> >> +
> >> +        /*
> >> +         * Map the area into the guest. For simplicity specify the entire range
> >> +         * up to the end of the page: All the function uses it for is to check
> >> +         * that the range doesn't cross page boundaries. Having the area mapped
> >> +         * in the original domain implies that it fits there and therefore will
> >> +         * also fit in the clone.
> >> +         */
> >> +        offset = PAGE_OFFSET(d_area->map);
> >> +        ret = map_guest_area(cd_vcpu, gfn_to_gaddr(gfn) + offset,
> >> +                             PAGE_SIZE - offset, cd_area, NULL);
> >> +        if ( ret )
> >> +            return ret;
> >> +    }
> >> +    else
> >> +        cd_mfn = page_to_mfn(cd_area->pg);
> >> +
> >> +    copy_domain_page(cd_mfn, d_mfn);
> > 
> > I think the page copy should be done only once, when the page is
> > populated on the child p2m.  Otherwise areas smaller than a page size
> > (like vpcu_time_info_t) that share the same page will get multiple
> > copies of the same data for no reason.
> 
> I think you're right, but this would then be another issue in the original
> code that I merely didn't spot (and it's not just "copy for no reason",
> we'd actually corrupt what was put there before). IOW the copying needs to
> move ahead of map_guest_area() (or yet more precisely after the error
> checking for p2m->set_entry()), and in the original code it would have
> needed to live ahead of map_vcpu_info(). Once again I'd like Tamas to
> confirm (or otherwise) before making that change, though.

Yes, it's already an issue in the current code.  I wonder whether
logic in the guest or Xen could malfunctions due to the fact that
map_vcpu_info() unconditionally sets evtchn_upcall_pending and injects
an event channel upcall, but the later call to copy_domain_page()
might unset evtchn_upcall_pending while the vector is already injected.

Thanks, Roger.


  reply	other threads:[~2023-09-28 11:08 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-28  6:59 [PATCH v4 0/9] runstate/time area registration by (guest) physical address Jan Beulich
2023-09-28  7:01 ` [PATCH v4 1/9] x86/shim: zap runstate and time area handles during shutdown Jan Beulich
2023-09-28  9:27   ` Roger Pau Monné
2023-09-28  7:14 ` [PATCH v4 2/9] domain: GADDR based shared guest area registration alternative - teardown Jan Beulich
2023-09-28  7:15 ` [PATCH v4 3/9] domain: update GADDR based runstate guest area Jan Beulich
2023-09-28 12:15   ` Julien Grall
2023-09-28  7:15 ` [PATCH v4 4/9] x86: update GADDR based secondary time area Jan Beulich
2023-09-28  7:16 ` [PATCH v4 5/9] x86/mem-sharing: copy GADDR based shared guest areas Jan Beulich
2023-09-28  9:51   ` Roger Pau Monné
2023-09-28 10:11     ` Jan Beulich
2023-09-28 11:07       ` Roger Pau Monné [this message]
     [not found]         ` <CABfawhmAZGTaKZfmwY4t8599i6qKaTOJ-fngFmtvS4LhJMh7iA@mail.gmail.com>
2023-09-28 13:19           ` Jan Beulich
     [not found]             ` <CABfawhkK7LVQKhTtCAMpoaGLH24SwpyEAJGpu-PohR5e6W=pMw@mail.gmail.com>
2023-10-16  9:50               ` Jan Beulich
2023-09-28 14:08           ` Roger Pau Monné
     [not found]             ` <CABfawhmfFuB14CiPdyaP4HNayms0AG5=4h1FNURu4MDgt2HzAg@mail.gmail.com>
2023-09-29 14:17               ` Roger Pau Monné
2023-09-28  7:16 ` [PATCH v4 6/9] domain: map/unmap " Jan Beulich
2023-09-28 10:04   ` Roger Pau Monné
2023-09-28 10:14     ` Jan Beulich
2023-09-28 11:10       ` Roger Pau Monné
2023-09-28  7:17 ` [PATCH v4 7/9] domain: introduce GADDR based runstate area registration alternative Jan Beulich
2023-09-28  7:17 ` [PATCH v4 8/9] x86: introduce GADDR based secondary time " Jan Beulich
2023-09-28  7:18 ` [PATCH v4 9/9] common: convert vCPU info area registration Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRVeiAFlyf1LJ2qR@MacBookPdeRoger \
    --to=roger.pau@citrix.com \
    --cc=Henry.Wang@arm.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=sstabellini@kernel.org \
    --cc=tamas@tklengyel.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.