xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: George Dunlap <george.dunlap@citrix.com>
Cc: Paul Durrant <Paul.Durrant@citrix.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	Jan Beulich <JBeulich@suse.com>,
	"Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
Subject: Re: [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
Date: Mon, 26 Sep 2016 14:57:58 +0800	[thread overview]
Message-ID: <57E8C6F6.3030409@linux.intel.com> (raw)
In-Reply-To: <25d252bf-3467-2714-edef-0c79a451df05@citrix.com>



On 9/23/2016 6:35 PM, George Dunlap wrote:
> On 22/09/16 17:02, Yu Zhang wrote:
>>
>> On 9/22/2016 7:32 PM, George Dunlap wrote:
>>> On Thu, Sep 22, 2016 at 10:12 AM, Yu Zhang
>>> <yu.c.zhang@linux.intel.com> wrote:
>>>> On 9/21/2016 9:04 PM, George Dunlap wrote:
>>>>> On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>> wrote:
>>>>>>> On 9/2/2016 6:47 PM, Yu Zhang wrote:
>>>>>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>>>>>>>> let one ioreq server claim/disclaim its responsibility for the
>>>>>>>> handling of guest pages with p2m type p2m_ioreq_server. Users
>>>>>>>> of this HVMOP can specify which kind of operation is supposed
>>>>>>>> to be emulated in a parameter named flags. Currently, this HVMOP
>>>>>>>> only support the emulation of write operations. And it can be
>>>>>>>> further extended to support the emulation of read ones if an
>>>>>>>> ioreq server has such requirement in the future.
>>>>>>>>
>>>>>>>> For now, we only support one ioreq server for this p2m type, so
>>>>>>>> once an ioreq server has claimed its ownership, subsequent calls
>>>>>>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also
>>>>>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
>>>>>>>> triggering this new HVMOP, with ioreq server id set to the current
>>>>>>>> owner's and flags parameter set to 0.
>>>>>>>>
>>>>>>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
>>>>>>>> are only supported for HVMs with HAP enabled.
>>>>>>>>
>>>>>>>> Also note that only after one ioreq server claims its ownership
>>>>>>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
>>>>>>>> be allowed.
>>>>>>>>
>>>>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>>>>> Acked-by: Tim Deegan <tim@xen.org>
>>>>>>>> ---
>>>>>>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>>>>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>>>>>>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>>>>>>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>>>>>>> Cc: Tim Deegan <tim@xen.org>
>>>>>>>>
>>>>>>>> changes in v6:
>>>>>>>>       - Clarify logic in hvmemul_do_io().
>>>>>>>>       - Use recursive lock for ioreq server lock.
>>>>>>>>       - Remove debug print when mapping ioreq server.
>>>>>>>>       - Clarify code in ept_p2m_type_to_flags() for consistency.
>>>>>>>>       - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
>>>>>>>>       - Add comments for HVMMEM_ioreq_server to note only changes
>>>>>>>>         to/from HVMMEM_ram_rw are permitted.
>>>>>>>>       - Add domain_pause/unpause() in
>>>>>>>> hvm_map_mem_type_to_ioreq_server()
>>>>>>>>         to avoid the race condition when a vm exit happens on a
>>>>>>>> write-
>>>>>>>>         protected page, just to find the ioreq server has been
>>>>>>>> unmapped
>>>>>>>>         already.
>>>>>>>>       - Introduce a seperate patch to delay the release of p2m
>>>>>>>>         lock to avoid the race condition.
>>>>>>>>       - Introduce a seperate patch to handle the read-modify-write
>>>>>>>>         operations on a write protected page.
>>>>>>>>
>>>>>>> Why do we need to do this?  Won't the default case just DTRT if it
>>>>>>> finds
>>>>>>> that the ioreq server has been unmapped?
>>>>>> Well, patch 4 will either mark the remaining p2m_ioreq_server
>>>>>> entries as
>>>>>> "recal" or
>>>>>> reset to p2m_ram_rw directly. So my understanding is that we do not
>>>>>> wish
>>>>>> to
>>>>>> see a ept violation due to a p2m_ioreq_server access after the ioreq
>>>>>> server
>>>>>> is unmapped.
>>>>>> Yet without this domain_pause/unpause() pair, VM accesses may
>>>>>> trigger an
>>>>>> ept
>>>>>> violation
>>>>>> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to
>>>>>> find
>>>>>> the ioreq
>>>>>> server is NULL. Then we would have to provide handlers which just
>>>>>> do the
>>>>>> copy to/from
>>>>>> actions for the VM. This seems awkward to me.
>>>>> So the race you're worried about is this:
>>>>>
>>>>> 1. Guest fault happens
>>>>> 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking
>>>>> 3. guest  finds no ioreq server present
>>>>>
>>>>> I think in that case the easiest thing to do would be to simply assume
>>>>> there was a race and re-execute the instruction.  Is that not possible
>>>>> for some reason?
>>>>>
>>>>>     -George
>>>> Thanks for your reply, George. :)
>>>> Two reasons I'd like to use the domain_pause/unpause() to avoid the race
>>>> condition:
>>>>
>>>> 1>  Like my previous explanation, in the read-modify-write scenario, the
>>>> ioreq server will
>>>> be NULL for the read emulation. But in such case, hypervisor will not
>>>> discard this trap, instead
>>>> it is supposed to do the copy work for the read access. So it would be
>>>> difficult for hypervisor
>>>> to decide if the ioreq server was detached due to a race condition,
>>>> or if
>>>> the ioreq server should
>>>> be a NULL because we are emulating a read operation first for a
>>>> read-modify-write instruction.
>>> Wouldn't a patch like the attached work (applied on top of the whole
>>> series)?
>> Thanks for your patch, George. I think it should work for 1>. But we
>> still have the dead lock
>> problem.  :)
>>
>> BTW, do you think a domain_pause will cause any new problem?
> Well using a "big hammer" like domain_pause in a case like this is
> usually indicates that there are other issues that aren't being solved
> properly -- for instance, latent deadlocks or unhandled race conditions.
> :-)  Leaving those issues around in the codebase but "papered over" by
> domain_pause is storing up technical debt that future generations will
> inherit and need to untangle.  (Particularly as in this case, there was
> no comment *in the code* explaining what problems the domain_pause was
> there to solve, so anyone wanting to try to remove it would need to just
> figure out.)
>
> domain_pause is relatively expensive to do (since you have to spin
> waiting for all the vcpus to finish running) and completely stops the
> domain from handling interrupts or anything for an arbitrary amount of
> time.  As long as it doesn't happen often, the cost shouldn't be a major
> issue; and as long as the domain_pause is short (less than 100ms), then
> it shouldn't cause any more problems than running on a fairly busy
> system should cause.
>
> On the other hand, if every time we ran into a tricky situation we just
> did a domain pause rather than solving the root issue, pretty soon the
> domain would be paused several times per second.  The performance would
> plummet, and fixing it would be a nightmare because you'd have hundreds
> of undocumented issues to try to understand and fix.
>
> So: the domain_pause itself isn't terrible (although it's better to
> avoid it if we can); what's more of a problem is the potential issues
> that it's hiding.  These issues can add up, so it's important to push
> back and ask "why do we need this and can we solve it a different way"
> pro-actively, as patches come in, rather than waiting until it becomes
> an issue.
>
>   -George

Thanks for your thorough explanation on this point. And I agree. :)

I have given a proposal to solve this deadlock issue in another mail. 
Nested locks are
key to the potential deadlock, and I believe they are not necessary in 
this case, so using
these locks sequentially could be our way out.

And as to domain pause, I can remove them if we can accept the 
possibility when an
ept violation happens, yet to find the ioreq server has already been 
unmapped(discarding
this operation).

B.R.
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-09-26  6:57 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-02 10:47 [PATCH v6 0/4] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
2016-09-02 10:47 ` [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
2016-09-05 13:31   ` Jan Beulich
2016-09-05 17:20     ` George Dunlap
2016-09-06  7:58       ` Jan Beulich
2016-09-06  8:03         ` Paul Durrant
2016-09-06  8:13           ` Jan Beulich
2016-09-06 10:00             ` Yu Zhang
2016-09-09  5:55     ` Yu Zhang
2016-09-09  8:09       ` Jan Beulich
2016-09-09  8:59         ` Yu Zhang
2016-09-05 17:23   ` George Dunlap
     [not found]   ` <57D24730.2050904@linux.intel.com>
2016-09-09  5:51     ` Yu Zhang
2016-09-21 13:04       ` George Dunlap
2016-09-22  9:12         ` Yu Zhang
2016-09-22 11:32           ` George Dunlap
2016-09-22 16:02             ` Yu Zhang
2016-09-23 10:35               ` George Dunlap
2016-09-26  6:57                 ` Yu Zhang [this message]
2016-09-26  6:58           ` Yu Zhang
2016-09-02 10:47 ` [PATCH v6 2/4] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
2016-09-05 13:49   ` Jan Beulich
     [not found]   ` <57D24782.6010701@linux.intel.com>
2016-09-09  5:56     ` Yu Zhang
2016-09-02 10:47 ` [PATCH v6 3/4] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
2016-09-05 14:10   ` Jan Beulich
     [not found]   ` <57D247F6.9010503@linux.intel.com>
2016-09-09  6:21     ` Yu Zhang
2016-09-09  8:12       ` Jan Beulich
2016-09-02 10:47 ` [PATCH v6 4/4] x86/ioreq server: Reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
2016-09-05 14:47   ` Jan Beulich
     [not found]   ` <57D24813.2090903@linux.intel.com>
2016-09-09  7:24     ` Yu Zhang
2016-09-09  8:20       ` Jan Beulich
2016-09-09  9:24         ` Yu Zhang
2016-09-09  9:44           ` Jan Beulich
2016-09-09  9:56             ` Yu Zhang
2016-09-09 10:09               ` Jan Beulich
2016-09-09 10:01                 ` Yu Zhang
2016-09-20  2:57                 ` Yu Zhang
2016-09-22 18:06                   ` George Dunlap
2016-09-23  1:31                     ` Yu Zhang
2016-09-06 10:57 ` [PATCH v6 0/4] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57E8C6F6.3030409@linux.intel.com \
    --to=yu.c.zhang@linux.intel.com \
    --cc=JBeulich@suse.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=Xen-devel@lists.xen.org \
    --cc=george.dunlap@citrix.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).