xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: George Dunlap <george.dunlap@citrix.com>
Cc: Paul Durrant <Paul.Durrant@citrix.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	Jan Beulich <JBeulich@suse.com>,
	"Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
Subject: Re: [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
Date: Mon, 26 Sep 2016 14:58:07 +0800	[thread overview]
Message-ID: <57E8C6FF.3050708@linux.intel.com> (raw)
In-Reply-To: <57E3A06E.1020000@linux.intel.com>



On 9/22/2016 5:12 PM, Yu Zhang wrote:
>
>
> On 9/21/2016 9:04 PM, George Dunlap wrote:
>> On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zhang@linux.intel.com> 
>> wrote:
>>>> On 9/2/2016 6:47 PM, Yu Zhang wrote:
>>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>>>>> let one ioreq server claim/disclaim its responsibility for the
>>>>> handling of guest pages with p2m type p2m_ioreq_server. Users
>>>>> of this HVMOP can specify which kind of operation is supposed
>>>>> to be emulated in a parameter named flags. Currently, this HVMOP
>>>>> only support the emulation of write operations. And it can be
>>>>> further extended to support the emulation of read ones if an
>>>>> ioreq server has such requirement in the future.
>>>>>
>>>>> For now, we only support one ioreq server for this p2m type, so
>>>>> once an ioreq server has claimed its ownership, subsequent calls
>>>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also
>>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
>>>>> triggering this new HVMOP, with ioreq server id set to the current
>>>>> owner's and flags parameter set to 0.
>>>>>
>>>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
>>>>> are only supported for HVMs with HAP enabled.
>>>>>
>>>>> Also note that only after one ioreq server claims its ownership
>>>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
>>>>> be allowed.
>>>>>
>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>> Acked-by: Tim Deegan <tim@xen.org>
>>>>> ---
>>>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>>>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>>>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>>>> Cc: Tim Deegan <tim@xen.org>
>>>>>
>>>>> changes in v6:
>>>>>     - Clarify logic in hvmemul_do_io().
>>>>>     - Use recursive lock for ioreq server lock.
>>>>>     - Remove debug print when mapping ioreq server.
>>>>>     - Clarify code in ept_p2m_type_to_flags() for consistency.
>>>>>     - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
>>>>>     - Add comments for HVMMEM_ioreq_server to note only changes
>>>>>       to/from HVMMEM_ram_rw are permitted.
>>>>>     - Add domain_pause/unpause() in 
>>>>> hvm_map_mem_type_to_ioreq_server()
>>>>>       to avoid the race condition when a vm exit happens on a write-
>>>>>       protected page, just to find the ioreq server has been unmapped
>>>>>       already.
>>>>>     - Introduce a seperate patch to delay the release of p2m
>>>>>       lock to avoid the race condition.
>>>>>     - Introduce a seperate patch to handle the read-modify-write
>>>>>       operations on a write protected page.
>>>>>
>>>> Why do we need to do this?  Won't the default case just DTRT if it 
>>>> finds
>>>> that the ioreq server has been unmapped?
>>>
>>> Well, patch 4 will either mark the remaining p2m_ioreq_server 
>>> entries as
>>> "recal" or
>>> reset to p2m_ram_rw directly. So my understanding is that we do not 
>>> wish to
>>> see a ept violation due to a p2m_ioreq_server access after the ioreq 
>>> server
>>> is unmapped.
>>> Yet without this domain_pause/unpause() pair, VM accesses may 
>>> trigger an ept
>>> violation
>>> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just 
>>> to find
>>> the ioreq
>>> server is NULL. Then we would have to provide handlers which just do 
>>> the
>>> copy to/from
>>> actions for the VM. This seems awkward to me.
>> So the race you're worried about is this:
>>
>> 1. Guest fault happens
>> 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking
>> 3. guest  finds no ioreq server present
>>
>> I think in that case the easiest thing to do would be to simply assume
>> there was a race and re-execute the instruction.  Is that not possible
>> for some reason?
>>
>>   -George
>
> Thanks for your reply, George. :)
> Two reasons I'd like to use the domain_pause/unpause() to avoid the 
> race condition:
>
> 1>  Like my previous explanation, in the read-modify-write scenario, 
> the ioreq server will
> be NULL for the read emulation. But in such case, hypervisor will not 
> discard this trap, instead
> it is supposed to do the copy work for the read access. So it would be 
> difficult for hypervisor
> to decide if the ioreq server was detached due to a race condition, or 
> if the ioreq server should
> be a NULL because we are emulating a read operation first for a 
> read-modify-write instruction.
>
> 2> I also realized it can avoid a deadlock possibility -
>     a> dom0 triggers map_mem_type_to_ioreq_server, which spin locks 
> the ioreq_server.lock,
>          and before routine hvm_map_mem_type_to_ioreq_server() returns,
>     b> The HVM triggers an ept violation, and p2m lock is held by 
> hvm_hap_nested_page_fault();
>     c> hypervisor continues to the I/O handler, which will need the 
> ioreq_server.lock to select
>          an ioreq server, which is being held by the hypercall handler 
> side;
>     d> likewise, the map_mem_type_to_ioreq_server will meet problem 
> when trying to get the
>          p2m lock when it calls p2m_change_entry_type_global().
>    With a domain_pause/unpause(), we could avoid the ept violation 
> during this period(which I
> believe won't be long because this pair only covers the 
> p2m_change_entry_type_global() call, not
> the p2m_finish_type_change() call).

Sorry, the deadlock issue should be between the p2m->ioreq.lock and the 
p2m lock which are both
inside p2m_set_ioreq_server() called by hvm_map_mem_type_to_ioreq_server().
The sequence when deadlock happens is similar to the above description:

a> dom0 triggers map_mem_type_to_ioreq_server, to unmap the 
p2m_ioreq_server type;
b> The HVM triggers an ept violation, and p2m lock is held by 
hvm_hap_nested_page_fault();
c> hypervisor continues to the I/O handler, which will need the 
p2m->ioreq.lock to select
      an ioreq server for the write protected address, which is being 
held by the hypercall handler
      side;
d> likewise, p2m_set_ioreq_server() will meet problem when trying to get 
the p2m lock when
      it calls p2m_change_entry_type_global().

Now I believe we can avoid this deadlock by moving the 
p2m_change_entry_type_global() outside
the p2m_set_ioreq_server() to its caller 
hvm_map_mem_type_to_ioreq_server(), and there would
be no nested lock between the p2m->ioreq.lock and p2m lock. :)

A code snippet:
@@ -944,6 +944,13 @@ int hvm_map_mem_type_to_ioreq_server(struct domain 
*d, ioservid_t id,
          if ( s->id == id )
          {
              rc = p2m_set_ioreq_server(d, flags, s);
+            if ( rc == 0 && flags == 0 )
+            {
+                struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+                if ( read_atomic(&p2m->ioreq.entry_count) )
+                    p2m_change_entry_type_global(d, p2m_ioreq_server, 
p2m_ram_rw);
+            }
              break;
          }
      }

Thanks
Yu

>
> Thanks
> Yu
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  parent reply	other threads:[~2016-09-26  6:58 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-02 10:47 [PATCH v6 0/4] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
2016-09-02 10:47 ` [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
2016-09-05 13:31   ` Jan Beulich
2016-09-05 17:20     ` George Dunlap
2016-09-06  7:58       ` Jan Beulich
2016-09-06  8:03         ` Paul Durrant
2016-09-06  8:13           ` Jan Beulich
2016-09-06 10:00             ` Yu Zhang
2016-09-09  5:55     ` Yu Zhang
2016-09-09  8:09       ` Jan Beulich
2016-09-09  8:59         ` Yu Zhang
2016-09-05 17:23   ` George Dunlap
     [not found]   ` <57D24730.2050904@linux.intel.com>
2016-09-09  5:51     ` Yu Zhang
2016-09-21 13:04       ` George Dunlap
2016-09-22  9:12         ` Yu Zhang
2016-09-22 11:32           ` George Dunlap
2016-09-22 16:02             ` Yu Zhang
2016-09-23 10:35               ` George Dunlap
2016-09-26  6:57                 ` Yu Zhang
2016-09-26  6:58           ` Yu Zhang [this message]
2016-09-02 10:47 ` [PATCH v6 2/4] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
2016-09-05 13:49   ` Jan Beulich
     [not found]   ` <57D24782.6010701@linux.intel.com>
2016-09-09  5:56     ` Yu Zhang
2016-09-02 10:47 ` [PATCH v6 3/4] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
2016-09-05 14:10   ` Jan Beulich
     [not found]   ` <57D247F6.9010503@linux.intel.com>
2016-09-09  6:21     ` Yu Zhang
2016-09-09  8:12       ` Jan Beulich
2016-09-02 10:47 ` [PATCH v6 4/4] x86/ioreq server: Reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
2016-09-05 14:47   ` Jan Beulich
     [not found]   ` <57D24813.2090903@linux.intel.com>
2016-09-09  7:24     ` Yu Zhang
2016-09-09  8:20       ` Jan Beulich
2016-09-09  9:24         ` Yu Zhang
2016-09-09  9:44           ` Jan Beulich
2016-09-09  9:56             ` Yu Zhang
2016-09-09 10:09               ` Jan Beulich
2016-09-09 10:01                 ` Yu Zhang
2016-09-20  2:57                 ` Yu Zhang
2016-09-22 18:06                   ` George Dunlap
2016-09-23  1:31                     ` Yu Zhang
2016-09-06 10:57 ` [PATCH v6 0/4] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57E8C6FF.3050708@linux.intel.com \
    --to=yu.c.zhang@linux.intel.com \
    --cc=JBeulich@suse.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=Xen-devel@lists.xen.org \
    --cc=george.dunlap@citrix.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).