Re: [PATCH v2 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Yu, Zhang" <yu.c.zhang@linux.intel.com>
To: Jan Beulich <JBeulich@suse.com>, Paul Durrant <paul.durrant@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>,
	xen-devel@lists.xen.org, zhiyuan.lv@intel.com,
	Jun Nakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH v2 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server
Date: Mon, 11 Apr 2016 19:14:38 +0800	[thread overview]
Message-ID: <570B871E.7040703@linux.intel.com> (raw)
In-Reply-To: <57083EA202000078000E623A@prv-mh.provo.novell.com>



On 4/9/2016 6:28 AM, Jan Beulich wrote:
>>>> On 31.03.16 at 12:53, <yu.c.zhang@linux.intel.com> wrote:
>> +static int mem_write(const struct hvm_io_handler *handler,
>> +                     uint64_t addr,
>> +                     uint32_t size,
>> +                     uint64_t data)
>> +{
>> +    struct domain *currd = current->domain;
>> +    unsigned long gmfn = paddr_to_pfn(addr);
>> +    unsigned long offset = addr & ~PAGE_MASK;
>> +    struct page_info *page = get_page_from_gfn(currd, gmfn, NULL, P2M_UNSHARE);
>> +    uint8_t *p;
>> +
>> +    if ( !page )
>> +        return X86EMUL_UNHANDLEABLE;
>> +
>> +    p = __map_domain_page(page);
>> +    p += offset;
>> +    memcpy(p, &data, size);
>
> What if the page is a r/o one? Not having found an ioreq server, I'm
> not sure assumptions on the page being writable can validly be made.
>
>> @@ -168,13 +226,72 @@ static int hvmemul_do_io(
>>           break;
>>       case X86EMUL_UNHANDLEABLE:
>>       {
>> -        struct hvm_ioreq_server *s =
>> -            hvm_select_ioreq_server(curr->domain, &p);
>> +        struct hvm_ioreq_server *s;
>> +        p2m_type_t p2mt;
>> +
>> +        if ( is_mmio )
>> +        {
>> +            unsigned long gmfn = paddr_to_pfn(addr);
>> +
>> +            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
>> +
>> +            switch ( p2mt )
>> +            {
>> +                case p2m_ioreq_server:
>> +                {
>> +                    unsigned long flags;
>> +
>> +                    p2m_get_ioreq_server(currd, &flags, &s);
>
> As the function apparently returns no value right now, please avoid
> the indirection on both values you're after - one of the two
> (presumably s) can be the function's return value.
>

Well, current implementation of p2m_get_ioreq_server() has spin_lock/
spin_unlock surrounding the reading of flags and the s, but I believe
we can also use the s as return value.

>> +                    if ( !s )
>> +                        break;
>> +
>> +                    if ( (dir == IOREQ_READ &&
>> +                          !(flags & P2M_IOREQ_HANDLE_READ_ACCESS)) ||
>> +                         (dir == IOREQ_WRITE &&
>> +                          !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS)) )
>
> I think this would be easier to read using a conditional expression
> with the condition being dir == IOREQ_<one-of-the-two>, just
> selecting either of the two possible bit masks.
>
>> +                        s = NULL;
>> +
>> +                    break;
>> +                }
>> +                case p2m_ram_rw:
>
> Blank line above here please.
>
>>           /* If there is no suitable backing DM, just ignore accesses */
>>           if ( !s )
>>           {
>> -            rc = hvm_process_io_intercept(&null_handler, &p);
>> +            switch ( p2mt )
>> +            {
>> +            case p2m_ioreq_server:
>> +            /*
>> +             * Race conditions may exist when access to a gfn with
>> +             * p2m_ioreq_server is intercepted by hypervisor, during
>> +             * which time p2m type of this gfn is recalculated back
>> +             * to p2m_ram_rw. mem_handler is used to handle this
>> +             * corner case.
>> +             */
>
> Now if there is such a race condition, the race could also be with a
> page changing first to ram_rw and then immediately further to e.g.
> ram_ro. See the earlier comment about assuming the page to be
> writable.
>

Thanks, Jan. After rechecking the code, I suppose the race condition
will not happen. In hvmemul_do_io(), get_gfn_query_unlocked() is used
to peek the p2mt for the gfn, but get_gfn_type_access() is called inside
hvm_hap_nested_page_fault(), and this will guarantee no p2m change shall
occur during the emulation.
Is this understanding correct?


>> +            case p2m_ram_rw:
>> +                rc = hvm_process_io_intercept(&mem_handler, &p);
>> +                break;
>> +
>> +            default:
>> +                rc = hvm_process_io_intercept(&null_handler, &p);
>
> Along with the above, I doubt it is correct to have e.g. ram_ro come
> here.
>

So if no race condition happens, no need to treat p2m_ram_rw specially.

>> +static int hvm_map_mem_type_to_ioreq_server(struct domain *d,
>> +                                            ioservid_t id,
>> +                                            hvmmem_type_t type,
>> +                                            uint32_t flags)
>> +{
>> +    struct hvm_ioreq_server *s;
>> +    int rc;
>> +
>> +    /* For now, only HVMMEM_ioreq_server is supported */
>> +    if ( type != HVMMEM_ioreq_server )
>> +        return -EINVAL;
>> +
>> +    if ( flags & ~(HVMOP_IOREQ_MEM_ACCESS_READ |
>> +                   HVMOP_IOREQ_MEM_ACCESS_WRITE) )
>> +        return -EINVAL;
>> +
>> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
>> +
>> +    rc = -ENOENT;
>> +    list_for_each_entry ( s,
>> +                          &d->arch.hvm_domain.ioreq_server.list,
>> +                          list_entry )
>> +    {
>> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
>> +            continue;
>> +
>> +        if ( s->id == id )
>> +        {
>> +            rc = p2m_set_ioreq_server(d, flags, s);
>> +            if ( rc == 0 )
>> +                gdprintk(XENLOG_DEBUG, "%u %s type HVMMEM_ioreq_server.\n",
>> +                         s->id, (flags != 0) ? "mapped to" : "unmapped from");
>
> Why gdprintk()? I don't think the current domain is of much
> interest here. What would be of interest is the subject domain.
>

s->id is not the domain_id, but id of the ioreq server.

>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -132,6 +132,19 @@ static void ept_p2m_type_to_flags(struct p2m_domain
>> *p2m, ept_entry_t *entry,
>>               entry->r = entry->w = entry->x = 1;
>>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>               break;
>> +        case p2m_ioreq_server:
>> +            entry->r = !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_READ_ACCESS);
>> +	    /*
>> +	     * write access right is disabled when entry->r is 0, but whether
>> +	     * write accesses are emulated by hypervisor or forwarded to an
>> +	     * ioreq server depends on the setting of p2m->ioreq.flags.
>> +	     */
>> +            entry->w = (entry->r &&
>> +                        !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS));
>> +            entry->x = entry->r;
>
> Why would we want to allow instruction execution from such pages?
> And with all three bits now possibly being clear, aren't we risking the
> entries to be mis-treated as not-present ones?
>

Hah. You got me. Thanks! :)
Now I realized it would be difficult if we wanna to emulate the read
operations for HVM. According to Intel mannual, entry->r is to be
cleared, so should entry->w if we do not want ept misconfig. And
with both read and write permissions being forbidden, entry->x can be
set only on processors with EXECUTE_ONLY capability.
To avoid any entry to be mis-treated as not-present. We have several
solutions:
a> do not support the read emulation for now - we have no such usage
case;
b> add the check of p2m_t against p2m_ioreq_server in is_epte_present -
a bit weird to me.
Which one do you prefer? or any other suggestions?

>> --- a/xen/arch/x86/mm/p2m-pt.c
>> +++ b/xen/arch/x86/mm/p2m-pt.c
>> @@ -72,8 +72,8 @@ static const unsigned long pgt[] = {
>>       PGT_l3_page_table
>>   };
>>
>> -static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>> -                                       unsigned int level)
>> +static unsigned long p2m_type_to_flags(struct p2m_domain *p2m, p2m_type_t t,
>
> const
>
>> +int p2m_set_ioreq_server(struct domain *d,
>> +                         unsigned long flags,
>> +                         struct hvm_ioreq_server *s)
>> +{
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    int rc;
>> +
>> +    spin_lock(&p2m->ioreq.lock);
>> +
>> +    rc = -EBUSY;
>> +    if ( (flags != 0) && (p2m->ioreq.server != NULL) )
>> +        goto out;
>> +
>> +    rc = -EINVAL;
>> +    /* unmap ioreq server from p2m type by passing flags with 0 */
>
> Comment style (also elsewhere).
>
>> +    if ( (flags == 0) && (p2m->ioreq.server != s) )
>> +        goto out;
>
> The two flags checks above are redundant with ...
>
>> +    if ( flags == 0 )
>> +    {
>> +        p2m->ioreq.server = NULL;
>> +        p2m->ioreq.flags = 0;
>> +    }
>> +    else
>> +    {
>> +        p2m->ioreq.server = s;
>> +        p2m->ioreq.flags = flags;
>> +    }
>
> ... this - I think the earlier ones should be folded into this.
>

Ok. I'll have a try. :)

>> +    /*
>> +     * Each time we map/unmap an ioreq server to/from p2m_ioreq_server,
>> +     * we mark the p2m table to be recalculated, so that gfns which were
>> +     * previously marked with p2m_ioreq_server can be resynced.
>> +     */
>> +    p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>
> What does "resynced" here mean? I.e. I can see why this is wanted
> when unmapping a server, but when mapping a server there shouldn't
> be any such pages in the first place.
>

There shouldn't be. But if there is(misbehavior from the device model
side), it can be recalculated back to p2m_ram_rw(which is not quite
necessary as the unmapping case).

>> +    rc = 0;
>> +
>> +out:
>
> Labels indented by at least one space please.
>
>> @@ -320,6 +321,27 @@ struct p2m_domain {
>>           struct ept_data ept;
>>           /* NPT-equivalent structure could be added here. */
>>       };
>> +
>> +    struct {
>> +        spinlock_t lock;
>> +        /*
>> +         * ioreq server who's responsible for the emulation of
>> +         * gfns with specific p2m type(for now, p2m_ioreq_server).
>> +         * Behaviors of gfns with p2m_ioreq_server set but no
>> +         * ioreq server mapped in advance should be the same as
>> +         * p2m_ram_rw.
>> +         */
>> +        struct hvm_ioreq_server *server;
>> +        /*
>> +         * flags specifies whether read, write or both operations
>> +         * are to be emulated by an ioreq server.
>> +         */
>> +        unsigned long flags;
>
> unsigned int
>
>> --- a/xen/include/public/hvm/hvm_op.h
>> +++ b/xen/include/public/hvm/hvm_op.h
>> @@ -489,6 +489,43 @@ struct xen_hvm_altp2m_op {
>>   typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
>>   DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
>>
>> +#if defined(__XEN__) || defined(__XEN_TOOLS__)
>
> Instead of adding yet another such section, couldn't this be added
> to an already existing one?
>

Yes.

>> +struct xen_hvm_map_mem_type_to_ioreq_server {
>> +    domid_t domid;      /* IN - domain to be serviced */
>> +    ioservid_t id;      /* IN - ioreq server id */
>> +    hvmmem_type_t type; /* IN - memory type */
>
> You can't use this type for public interface structure fields - this
> must be uintXX_t.
>

Got it.


Thanks
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2016-04-11 11:14 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-31 10:53 [PATCH v2 0/3] x86/ioreq server: introduce HVMMEM_ioreq_server mem type Yu Zhang
2016-03-31 10:53 ` [PATCH v2 1/3] x86/ioreq server: Add new functions to get/set memory types Yu Zhang
2016-04-05 13:57   ` George Dunlap
2016-04-05 14:08     ` George Dunlap
2016-04-08 13:25   ` Andrew Cooper
2016-03-31 10:53 ` [PATCH v2 2/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server Yu Zhang
2016-04-05 14:38   ` George Dunlap
2016-04-08 13:26   ` Andrew Cooper
2016-04-08 21:48   ` Jan Beulich
2016-04-18  8:41     ` Paul Durrant
2016-04-18  9:10       ` George Dunlap
2016-04-18  9:14         ` Wei Liu
2016-04-18  9:45           ` Paul Durrant
2016-04-18 16:40       ` Jan Beulich
2016-04-18 16:45         ` Paul Durrant
2016-04-18 16:47           ` Jan Beulich
2016-04-18 16:58             ` Paul Durrant
2016-04-19 11:02               ` Yu, Zhang
2016-04-19 11:15                 ` Paul Durrant
2016-04-19 11:38                   ` Yu, Zhang
2016-04-19 11:50                     ` Paul Durrant
2016-04-19 16:51                     ` Jan Beulich
2016-04-20 14:59                       ` Wei Liu
2016-04-20 15:02                 ` George Dunlap
2016-04-20 16:30                   ` George Dunlap
2016-04-20 16:52                     ` Jan Beulich
2016-04-20 16:58                       ` Paul Durrant
2016-04-20 17:06                         ` George Dunlap
2016-04-20 17:09                           ` Paul Durrant
2016-04-21 12:24                           ` Yu, Zhang
2016-04-21 13:31                             ` Paul Durrant
2016-04-21 13:48                               ` Yu, Zhang
2016-04-21 13:56                                 ` Paul Durrant
2016-04-21 14:09                                   ` George Dunlap
2016-04-20 17:08                       ` George Dunlap
2016-04-21 12:04                       ` Yu, Zhang
2016-03-31 10:53 ` [PATCH v2 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
     [not found]   ` <20160404082556.GC28633@deinos.phlegethon.org>
2016-04-05  6:01     ` Yu, Zhang
2016-04-06 17:13   ` George Dunlap
2016-04-07  7:01     ` Yu, Zhang
     [not found]       ` <CAFLBxZbLp2zWzCzQTaJNWbanQSmTJ57ZyTh0qaD-+YUn8o8pyQ@mail.gmail.com>
2016-04-08 10:39         ` George Dunlap
     [not found]         ` <5707839F.9060803@linux.intel.com>
2016-04-08 11:01           ` George Dunlap
2016-04-11 11:15             ` Yu, Zhang
2016-04-14 10:45               ` Yu, Zhang
2016-04-18 15:57                 ` Paul Durrant
2016-04-19  9:11                   ` Yu, Zhang
2016-04-19  9:21                     ` Paul Durrant
2016-04-19  9:44                       ` Yu, Zhang
2016-04-19 10:05                         ` Paul Durrant
2016-04-19 11:17                           ` Yu, Zhang
2016-04-19 11:47                             ` Paul Durrant
2016-04-19 11:59                               ` Yu, Zhang
2016-04-20 14:50                                 ` George Dunlap
2016-04-20 14:57                                   ` Paul Durrant
2016-04-20 15:37                                     ` George Dunlap
2016-04-20 16:30                                       ` Paul Durrant
2016-04-20 16:58                                         ` George Dunlap
2016-04-21 13:28                                         ` Yu, Zhang
2016-04-21 13:21                                   ` Yu, Zhang
2016-04-22 11:27                                     ` Wei Liu
2016-04-22 11:30                                       ` George Dunlap
2016-04-19  4:37                 ` Tian, Kevin
2016-04-19  9:21                   ` Yu, Zhang
2016-04-08 13:33   ` Andrew Cooper
2016-04-11 11:14     ` Yu, Zhang
2016-04-11 12:20       ` Andrew Cooper
2016-04-11 16:25         ` Jan Beulich
2016-04-08 22:28   ` Jan Beulich
2016-04-11 11:14     ` Yu, Zhang [this message]
2016-04-11 16:31       ` Jan Beulich
2016-04-12  9:37         ` Yu, Zhang
2016-04-12 15:08           ` Jan Beulich
2016-04-14  9:56             ` Yu, Zhang
2016-04-19  4:50               ` Tian, Kevin
2016-04-19  8:46                 ` Paul Durrant
2016-04-19  9:27                   ` Yu, Zhang
2016-04-19  9:40                     ` Paul Durrant
2016-04-19  9:49                       ` Yu, Zhang
2016-04-19 10:01                         ` Paul Durrant
2016-04-19  9:54                           ` Yu, Zhang
2016-04-19  9:15                 ` Yu, Zhang
2016-04-19  9:23                   ` Paul Durrant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=570B871E.7040703@linux.intel.com \
    --to=yu.c.zhang@linux.intel.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=paul.durrant@citrix.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.