From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: George Dunlap <george.dunlap@citrix.com>
Cc: Paul Durrant <Paul.Durrant@citrix.com>,
"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
Jan Beulich <JBeulich@suse.com>,
"Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
Subject: Re: [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
Date: Mon, 26 Sep 2016 14:57:58 +0800 [thread overview]
Message-ID: <57E8C6F6.3030409@linux.intel.com> (raw)
In-Reply-To: <25d252bf-3467-2714-edef-0c79a451df05@citrix.com>
On 9/23/2016 6:35 PM, George Dunlap wrote:
> On 22/09/16 17:02, Yu Zhang wrote:
>>
>> On 9/22/2016 7:32 PM, George Dunlap wrote:
>>> On Thu, Sep 22, 2016 at 10:12 AM, Yu Zhang
>>> <yu.c.zhang@linux.intel.com> wrote:
>>>> On 9/21/2016 9:04 PM, George Dunlap wrote:
>>>>> On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>> wrote:
>>>>>>> On 9/2/2016 6:47 PM, Yu Zhang wrote:
>>>>>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>>>>>>>> let one ioreq server claim/disclaim its responsibility for the
>>>>>>>> handling of guest pages with p2m type p2m_ioreq_server. Users
>>>>>>>> of this HVMOP can specify which kind of operation is supposed
>>>>>>>> to be emulated in a parameter named flags. Currently, this HVMOP
>>>>>>>> only support the emulation of write operations. And it can be
>>>>>>>> further extended to support the emulation of read ones if an
>>>>>>>> ioreq server has such requirement in the future.
>>>>>>>>
>>>>>>>> For now, we only support one ioreq server for this p2m type, so
>>>>>>>> once an ioreq server has claimed its ownership, subsequent calls
>>>>>>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also
>>>>>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
>>>>>>>> triggering this new HVMOP, with ioreq server id set to the current
>>>>>>>> owner's and flags parameter set to 0.
>>>>>>>>
>>>>>>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
>>>>>>>> are only supported for HVMs with HAP enabled.
>>>>>>>>
>>>>>>>> Also note that only after one ioreq server claims its ownership
>>>>>>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
>>>>>>>> be allowed.
>>>>>>>>
>>>>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>>>>> Acked-by: Tim Deegan <tim@xen.org>
>>>>>>>> ---
>>>>>>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>>>>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>>>>>>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>>>>>>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>>>>>>> Cc: Tim Deegan <tim@xen.org>
>>>>>>>>
>>>>>>>> changes in v6:
>>>>>>>> - Clarify logic in hvmemul_do_io().
>>>>>>>> - Use recursive lock for ioreq server lock.
>>>>>>>> - Remove debug print when mapping ioreq server.
>>>>>>>> - Clarify code in ept_p2m_type_to_flags() for consistency.
>>>>>>>> - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
>>>>>>>> - Add comments for HVMMEM_ioreq_server to note only changes
>>>>>>>> to/from HVMMEM_ram_rw are permitted.
>>>>>>>> - Add domain_pause/unpause() in
>>>>>>>> hvm_map_mem_type_to_ioreq_server()
>>>>>>>> to avoid the race condition when a vm exit happens on a
>>>>>>>> write-
>>>>>>>> protected page, just to find the ioreq server has been
>>>>>>>> unmapped
>>>>>>>> already.
>>>>>>>> - Introduce a seperate patch to delay the release of p2m
>>>>>>>> lock to avoid the race condition.
>>>>>>>> - Introduce a seperate patch to handle the read-modify-write
>>>>>>>> operations on a write protected page.
>>>>>>>>
>>>>>>> Why do we need to do this? Won't the default case just DTRT if it
>>>>>>> finds
>>>>>>> that the ioreq server has been unmapped?
>>>>>> Well, patch 4 will either mark the remaining p2m_ioreq_server
>>>>>> entries as
>>>>>> "recal" or
>>>>>> reset to p2m_ram_rw directly. So my understanding is that we do not
>>>>>> wish
>>>>>> to
>>>>>> see a ept violation due to a p2m_ioreq_server access after the ioreq
>>>>>> server
>>>>>> is unmapped.
>>>>>> Yet without this domain_pause/unpause() pair, VM accesses may
>>>>>> trigger an
>>>>>> ept
>>>>>> violation
>>>>>> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to
>>>>>> find
>>>>>> the ioreq
>>>>>> server is NULL. Then we would have to provide handlers which just
>>>>>> do the
>>>>>> copy to/from
>>>>>> actions for the VM. This seems awkward to me.
>>>>> So the race you're worried about is this:
>>>>>
>>>>> 1. Guest fault happens
>>>>> 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking
>>>>> 3. guest finds no ioreq server present
>>>>>
>>>>> I think in that case the easiest thing to do would be to simply assume
>>>>> there was a race and re-execute the instruction. Is that not possible
>>>>> for some reason?
>>>>>
>>>>> -George
>>>> Thanks for your reply, George. :)
>>>> Two reasons I'd like to use the domain_pause/unpause() to avoid the race
>>>> condition:
>>>>
>>>> 1> Like my previous explanation, in the read-modify-write scenario, the
>>>> ioreq server will
>>>> be NULL for the read emulation. But in such case, hypervisor will not
>>>> discard this trap, instead
>>>> it is supposed to do the copy work for the read access. So it would be
>>>> difficult for hypervisor
>>>> to decide if the ioreq server was detached due to a race condition,
>>>> or if
>>>> the ioreq server should
>>>> be a NULL because we are emulating a read operation first for a
>>>> read-modify-write instruction.
>>> Wouldn't a patch like the attached work (applied on top of the whole
>>> series)?
>> Thanks for your patch, George. I think it should work for 1>. But we
>> still have the dead lock
>> problem. :)
>>
>> BTW, do you think a domain_pause will cause any new problem?
> Well using a "big hammer" like domain_pause in a case like this is
> usually indicates that there are other issues that aren't being solved
> properly -- for instance, latent deadlocks or unhandled race conditions.
> :-) Leaving those issues around in the codebase but "papered over" by
> domain_pause is storing up technical debt that future generations will
> inherit and need to untangle. (Particularly as in this case, there was
> no comment *in the code* explaining what problems the domain_pause was
> there to solve, so anyone wanting to try to remove it would need to just
> figure out.)
>
> domain_pause is relatively expensive to do (since you have to spin
> waiting for all the vcpus to finish running) and completely stops the
> domain from handling interrupts or anything for an arbitrary amount of
> time. As long as it doesn't happen often, the cost shouldn't be a major
> issue; and as long as the domain_pause is short (less than 100ms), then
> it shouldn't cause any more problems than running on a fairly busy
> system should cause.
>
> On the other hand, if every time we ran into a tricky situation we just
> did a domain pause rather than solving the root issue, pretty soon the
> domain would be paused several times per second. The performance would
> plummet, and fixing it would be a nightmare because you'd have hundreds
> of undocumented issues to try to understand and fix.
>
> So: the domain_pause itself isn't terrible (although it's better to
> avoid it if we can); what's more of a problem is the potential issues
> that it's hiding. These issues can add up, so it's important to push
> back and ask "why do we need this and can we solve it a different way"
> pro-actively, as patches come in, rather than waiting until it becomes
> an issue.
>
> -George
Thanks for your thorough explanation on this point. And I agree. :)
I have given a proposal to solve this deadlock issue in another mail.
Nested locks are
key to the potential deadlock, and I believe they are not necessary in
this case, so using
these locks sequentially could be our way out.
And as to domain pause, I can remove them if we can accept the
possibility when an
ept violation happens, yet to find the ioreq server has already been
unmapped(discarding
this operation).
B.R.
Yu
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-09-26 6:57 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-02 10:47 [PATCH v6 0/4] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
2016-09-02 10:47 ` [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
2016-09-05 13:31 ` Jan Beulich
2016-09-05 17:20 ` George Dunlap
2016-09-06 7:58 ` Jan Beulich
2016-09-06 8:03 ` Paul Durrant
2016-09-06 8:13 ` Jan Beulich
2016-09-06 10:00 ` Yu Zhang
2016-09-09 5:55 ` Yu Zhang
2016-09-09 8:09 ` Jan Beulich
2016-09-09 8:59 ` Yu Zhang
2016-09-05 17:23 ` George Dunlap
[not found] ` <57D24730.2050904@linux.intel.com>
2016-09-09 5:51 ` Yu Zhang
2016-09-21 13:04 ` George Dunlap
2016-09-22 9:12 ` Yu Zhang
2016-09-22 11:32 ` George Dunlap
2016-09-22 16:02 ` Yu Zhang
2016-09-23 10:35 ` George Dunlap
2016-09-26 6:57 ` Yu Zhang [this message]
2016-09-26 6:58 ` Yu Zhang
2016-09-02 10:47 ` [PATCH v6 2/4] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
2016-09-05 13:49 ` Jan Beulich
[not found] ` <57D24782.6010701@linux.intel.com>
2016-09-09 5:56 ` Yu Zhang
2016-09-02 10:47 ` [PATCH v6 3/4] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
2016-09-05 14:10 ` Jan Beulich
[not found] ` <57D247F6.9010503@linux.intel.com>
2016-09-09 6:21 ` Yu Zhang
2016-09-09 8:12 ` Jan Beulich
2016-09-02 10:47 ` [PATCH v6 4/4] x86/ioreq server: Reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
2016-09-05 14:47 ` Jan Beulich
[not found] ` <57D24813.2090903@linux.intel.com>
2016-09-09 7:24 ` Yu Zhang
2016-09-09 8:20 ` Jan Beulich
2016-09-09 9:24 ` Yu Zhang
2016-09-09 9:44 ` Jan Beulich
2016-09-09 9:56 ` Yu Zhang
2016-09-09 10:09 ` Jan Beulich
2016-09-09 10:01 ` Yu Zhang
2016-09-20 2:57 ` Yu Zhang
2016-09-22 18:06 ` George Dunlap
2016-09-23 1:31 ` Yu Zhang
2016-09-06 10:57 ` [PATCH v6 0/4] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57E8C6F6.3030409@linux.intel.com \
--to=yu.c.zhang@linux.intel.com \
--cc=JBeulich@suse.com \
--cc=Paul.Durrant@citrix.com \
--cc=Xen-devel@lists.xen.org \
--cc=george.dunlap@citrix.com \
--cc=zhiyuan.lv@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).