From: George Dunlap <george.dunlap@citrix.com>
To: Yu Zhang <yu.c.zhang@linux.intel.com>, Jan Beulich <JBeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
George Dunlap <george.dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
xen-devel@lists.xen.org, Paul Durrant <paul.durrant@citrix.com>,
zhiyuan.lv@intel.com, Jun Nakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
Date: Fri, 7 Apr 2017 15:22:12 +0100 [thread overview]
Message-ID: <751cce04-c55b-f7f1-d0fa-cafb16ceef75@citrix.com> (raw)
In-Reply-To: <58E79C92.8070703@linux.intel.com>
On 07/04/17 15:05, Yu Zhang wrote:
>
>
> On 4/7/2017 9:56 PM, George Dunlap wrote:
>> On 07/04/17 12:31, Jan Beulich wrote:
>>>>>> On 07.04.17 at 12:55, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 4/7/2017 6:26 PM, Jan Beulich wrote:
>>>>>>>> On 07.04.17 at 11:53, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>>>>>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain
>>>>>>>> *p2m,
>>>>>>>> if ( is_epte_valid(ept_entry) )
>>>>>>>> {
>>>>>>>> if ( (recalc || ept_entry->recalc) &&
>>>>>>>> - p2m_is_changeable(ept_entry->sa_p2mt) )
>>>>>>>> + p2m_check_changeable(ept_entry->sa_p2mt) )
>>>>>>> I think the distinction between these two is rather arbitrary, and I
>>>>>>> also think this is part of the problem above: Distinguishing
>>>>>>> log-dirty
>>>>>>> from ram-rw requires auxiliary data to be consulted. The same
>>>>>>> ought to apply to ioreq-server, and then there wouldn't be a need
>>>>>>> to have two p2m_*_changeable() flavors.
>>>>>> Well, I think we have also discussed this quite long ago, here is
>>>>>> the link.
>>>>>> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html
>>>>> That was a different discussion, namely not considering this ...
>>>>>
>>>>>>> Of course the subsequent use p2m_is_logdirty_range() may then
>>>>>>> need amending.
>>>>>>>
>>>>>>> In the end it looks like you have the inverse problem here compared
>>>>>>> to above: You should return ram-rw when the reset was already
>>>>>>> initiated. At least that's how I would see the logic to match up
>>>>>>> with
>>>>>>> the log-dirty handling (where the _effective_ rather than the last
>>>>>>> stored type is being returned).
>>>>> ... at all.
>>>> Sorry I still do not get it.
>>> Leaving ioreq-server out of the picture, what the function returns
>>> to the caller is the type as it would be if a re-calculation would have
>>> been done on the entry, even if that hasn't happened yet (the
>>> function here shouldn't change any entries after all). I think we
>>> want the same behavior here for what have been ioreq-server
>>> entries (and which would become ram-rw after the next re-calc).
>> How about something like the attached? (In-lined for convenience.)
>>
>> -George
>>
>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>> entries need to be reset back to p2m_ram_rw. This patch does this
>> asynchronously with the current p2m_change_entry_type_global()
>> interface.
>>
>> New field entry_count is introduced in struct p2m_domain, to record
>> the number of p2m_ioreq_server p2m page table entries. One nature of
>> these entries is that they only point to 4K sized page frames, because
>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
>> p2m_change_type_one(). We do not need to worry about the counting for
>> 2M/1G sized pages.
>>
>> This patch disallows mapping of an ioreq server, when there's still
>> p2m_ioreq_server entry left, in case another mapping occurs right after
>> the current one being unmapped, releases its lock, with p2m table not
>> synced yet.
>>
>> This patch also disallows live migration, when there's remaining
>> p2m_ioreq_server entry in p2m table. The core reason is our current
>> implementation of p2m_change_entry_type_global() lacks information
>> to resync p2m_ioreq_server entries correctly if global_logdirty is
>> on.
>>
>> We still need to handle other recalculations, however; which means
>> that when doing a recalculation, if the current type is
>> p2m_ioreq_server, we check to see if p2m->ioreq.server is valid or
>> not. If it is, we leave it as type p2m_ioreq_server; if not, we reset
>> it to p2m_ram as appropriate.
>>
>> To avoid code duplication, lift recalc_type() out of p2m-pt.c and use
>> it for all type recalculations (both in p2m-pt.c and p2m-ept.c).
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
>> ---
>> xen/arch/x86/hvm/ioreq.c | 8 ++++++
>> xen/arch/x86/mm/hap/hap.c | 9 ++++++
>> xen/arch/x86/mm/p2m-ept.c | 73
>> +++++++++++++++++++++++++++++------------------
>> xen/arch/x86/mm/p2m-pt.c | 58 ++++++++++++++++++++++++-------------
>> xen/arch/x86/mm/p2m.c | 9 ++++++
>> xen/include/asm-x86/p2m.h | 26 ++++++++++++++++-
>> 6 files changed, 135 insertions(+), 48 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
>> index 5bf3b6d..07a6c26 100644
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain
>> *d, ioservid_t id,
>>
>> spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>>
>> + if ( rc == 0 && flags == 0 )
>> + {
>> + struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> + if ( read_atomic(&p2m->ioreq.entry_count) )
>> + p2m_change_entry_type_global(d, p2m_ioreq_server,
>> p2m_ram_rw);
>> + }
>> +
>> return rc;
>> }
>>
>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
>> index a57b385..4b591fe 100644
>> --- a/xen/arch/x86/mm/hap/hap.c
>> +++ b/xen/arch/x86/mm/hap/hap.c
>> @@ -187,6 +187,15 @@ out:
>> */
>> static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
>> {
>> + struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> + /*
>> + * Refuse to turn on global log-dirty mode if
>> + * there are outstanding p2m_ioreq_server pages.
>> + */
>> + if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
>> + return -EBUSY;
>> +
>> /* turn on PG_log_dirty bit in paging mode */
>> paging_lock(d);
>> d->arch.paging.mode |= PG_log_dirty;
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index cc1eb21..9f41067 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -533,6 +533,7 @@ static int resolve_misconfig(struct p2m_domain *p2m,
>> unsigned long gfn)
>> {
>> for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
>> {
>> + p2m_type_t nt;
>> e = atomic_read_ept_entry(&epte[i]);
>> if ( e.emt == MTRR_NUM_TYPES )
>> e.emt = 0;
>> @@ -542,10 +543,15 @@ static int resolve_misconfig(struct p2m_domain
>> *p2m, unsigned long gfn)
>> _mfn(e.mfn), 0, &ipat,
>> e.sa_p2mt ==
>> p2m_mmio_direct);
>> e.ipat = ipat;
>> - if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>> - {
>> - e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn +
>> i, gfn + i)
>> - ? p2m_ram_logdirty : p2m_ram_rw;
>> + nt = p2m_recalc_type(e.recalc, e.sa_p2mt, p2m, gfn
>> + i);
>> + if ( nt != e.sa_p2mt ) {
>> + if ( e.sa_p2mt == p2m_ioreq_server )
>> + {
>> + ASSERT(p2m->ioreq.entry_count > 0);
>> + p2m->ioreq.entry_count--;
>> + }
>> +
>> + e.sa_p2mt = nt;
>> ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt,
>> e.access);
>> }
>> e.recalc = 0;
>> @@ -562,23 +568,24 @@ static int resolve_misconfig(struct p2m_domain
>> *p2m, unsigned long gfn)
>>
>> if ( recalc && p2m_is_changeable(e.sa_p2mt) )
>> {
>> - unsigned long mask = ~0UL << (level *
>> EPT_TABLE_ORDER);
>> -
>> - switch ( p2m_is_logdirty_range(p2m, gfn & mask,
>> - gfn | ~mask) )
>> - {
>> - case 0:
>> - e.sa_p2mt = p2m_ram_rw;
>> - e.recalc = 0;
>> - break;
>> - case 1:
>> - e.sa_p2mt = p2m_ram_logdirty;
>> - e.recalc = 0;
>> - break;
>> - default: /* Force split. */
>> - emt = -1;
>> - break;
>> - }
>> + unsigned long mask = ~0UL << (level *
>> EPT_TABLE_ORDER);
>> +
>> + ASSERT(e.sa_p2mt != p2m_ioreq_server);
>> + switch ( p2m_is_logdirty_range(p2m, gfn & mask,
>> + gfn | ~mask) )
>> + {
>> + case 0:
>> + e.sa_p2mt = p2m_ram_rw;
>> + e.recalc = 0;
>> + break;
>> + case 1:
>> + e.sa_p2mt = p2m_ram_logdirty;
>> + e.recalc = 0;
>> + break;
>> + default: /* Force split. */
>> + emt = -1;
>> + break;
>> + }
>
>
> So here I guess only the "ASSERT(e.sa_p2mt != p2m_ioreq_server);" is
> new, right?
Yes, that's right -- the whole block was indented wrong; when I added
the ASSERT() my editor aligned it correctly, and I couldn't very well
leave it that way. :-)
-George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-04-07 14:22 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-06 15:53 [PATCH v12 0/6] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
2017-04-06 15:53 ` [PATCH v12 1/6] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
2017-04-06 15:53 ` [PATCH v12 2/6] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
2017-04-07 7:33 ` Tian, Kevin
2017-04-06 15:53 ` [PATCH v12 3/6] x86/ioreq server: Add device model wrappers for new DMOP Yu Zhang
2017-04-06 15:53 ` [PATCH v12 4/6] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
2017-04-06 15:53 ` [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
2017-04-06 16:45 ` George Dunlap
2017-04-07 7:34 ` Tian, Kevin
2017-04-07 9:40 ` Jan Beulich
2017-04-07 9:53 ` Yu Zhang
2017-04-07 10:22 ` George Dunlap
2017-04-07 10:22 ` Yu Zhang
2017-04-07 10:41 ` Jan Beulich
2017-04-07 10:26 ` Jan Beulich
2017-04-07 10:55 ` Yu Zhang
2017-04-07 11:31 ` Jan Beulich
2017-04-07 13:56 ` George Dunlap
2017-04-07 14:05 ` Yu Zhang
2017-04-07 14:22 ` George Dunlap [this message]
2017-04-07 14:22 ` Jan Beulich
2017-04-07 10:14 ` Yu Zhang
2017-04-07 10:28 ` Jan Beulich
2017-04-07 10:28 ` George Dunlap
2017-04-07 10:50 ` Yu Zhang
2017-04-07 11:28 ` Jan Beulich
2017-04-07 12:17 ` Yu Zhang
2017-04-07 12:36 ` Jan Beulich
2017-04-06 15:53 ` [PATCH v12 6/6] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=751cce04-c55b-f7f1-d0fa-cafb16ceef75@citrix.com \
--to=george.dunlap@citrix.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=george.dunlap@eu.citrix.com \
--cc=jun.nakajima@intel.com \
--cc=kevin.tian@intel.com \
--cc=paul.durrant@citrix.com \
--cc=xen-devel@lists.xen.org \
--cc=yu.c.zhang@linux.intel.com \
--cc=zhiyuan.lv@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).