Re: [PATCH RFC V9 4/5] xen, libxc: Request page fault injection via libxc

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Razvan Cojocaru <rcojocaru@bitdefender.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: kevin.tian@intel.com, eddie.dong@intel.com,
	stefano.stabellini@eu.citrix.com, andrew.cooper3@citrix.com,
	tim@xen.org, jun.nakajima@intel.com,
	xen-devel@lists.xenproject.org, ian.jackson@eu.citrix.com,
	ian.campbell@citrix.com
Subject: Re: [PATCH RFC V9 4/5] xen, libxc: Request page fault injection via libxc
Date: Tue, 02 Sep 2014 12:18:06 +0300	[thread overview]
Message-ID: <54058B4E.9060001@bitdefender.com> (raw)
In-Reply-To: <54047D1D020000780002F73A@mail.emea.novell.com>

On 09/01/2014 03:05 PM, Jan Beulich wrote:
>>>> On 01.09.14 at 13:54, <rcojocaru@bitdefender.com> wrote:
>> On 09/01/2014 12:08 PM, Jan Beulich wrote:
>>>>>> On 01.09.14 at 09:36, <rcojocaru@bitdefender.com> wrote:
>>>> On 08/29/2014 12:27 PM, Jan Beulich wrote:
>>>>>>>> On 29.08.14 at 09:44, <rcojocaru@bitdefender.com> wrote:
>>>>>> I do understand the preference for a VCPU-based mechanism from a
>>>>>> concurrency point of view, but that would simply potentially fail for
>>>>>> us, hence defeating the purpose of the patch. I'm also not sure how that
>>>>>> would be useful in the general case either, since the same problem that
>>>>>> applies to us would seem to apply to the general case as well.
>>>>>
>>>>> Yeah, the whole thing probably needs a bit more thinking so that the
>>>>> interface doesn't end up being a BitDefender-special. Indeed together
>>>>> with the address space qualification, the interface might not be very
>>>>> useful when made vCPU-bound. And taking it a little further into the
>>>>> "generic" direction, allowing this to only inject #PF doesn't make a
>>>>> very nice interface either. Plus we already have HVMOP_inject_trap,
>>>>> i.e. your first line of thinking (and eventual explaining as the
>>>>> motivation for a patch) should be why that can't be used.
>>>>
>>>> I'd say that it's memory-introspection specific rather than 3rd-party
>>>> vendor specific. Without this this patch, memory-introspection support
>>>> in general is impacted / less flexible, since there's no other way to
>>>> bring swapped out pages back in.
>>>>
>>>> For all the reasons you've explained (at least as far as I understand
>>>> it) there's not much room to go more generic - so maybe just renaming
>>>> the libxc wrapper to something more specific (
>>>> xc_domain_request_usermode_pagefault?) is the solution here?
>>>
>>> Maybe, but only after you explained why the existing interface can
>>> neither be used nor suitably extended.
>>
>> As far as I understand the HVMOP_inject_trap interface, it is simply (in
>> this case) a way to trigger the equivalent of hvm_inject_page_fault()
>> from userspace (via libxc).
>>
>> We need two additional checks:
>>
>> 1. That CR3 matches, because the way the application works, we need to
>> inject a page fault related to the address space of whatever process is
>> interesting at the time, and
>>
>> 2. That we're in usermode (the CPL check), because we know that it's
>> safe to inject a page fault when we're in user mode, but it's not always
>> safe to do so in kernel mode.
>>
>> The current interface seems to be a low-level, basic wrapper around
>> hvm_inject_trap().
>>
>> What we're trying to do is ask for a page fault when we're both A) in
>> usermode, and B) when a target process matches - and are only interested
>> in page faults, no other trap vector.
>>
>> Technically, the checks could, probably, be moved into (and new
>> parameters added to) hvm_inject_trap() & friends, but it seems unlikely
>> that users other than memory introspection applications would be
>> interested in them, while they would possibly add to the complexity of
>> the interface. The rest of the clients would have to carry dummy
>> parameters around to use it.
> 
> I'm not sure: Injecting faults with certain restrictions (like in your
> case CPL and CR3) does seem to make quite some sense in
> general when the fault isn't intended to be terminal (of a process
> or the entire VM). It's just that so far we used fault injection only
> to indicate error conditions.
> 
> But as always - let's see if others are of differing opinion before
> settling on either route.

While waiting for other opinions, I've started looking at the
HVMOP_inject_trap code, and it uses a per-VCPU data-saving model similar
to the one we've been discussing (and agreed that does not work for the
explained use-cases).

The first giveaway is the libxc function:

/*
 * Injects a hardware/software CPU trap, to take effect the next time
the HVM
 * resumes.
 */
int xc_hvm_inject_trap(
    xc_interface *xch, domid_t dom, int vcpu, uint32_t vector,
    uint32_t type, uint32_t error_code, uint32_t insn_len,
    uint64_t cr2);

which takes an "int vcpu" parameter. Then, this happens in
xen/arch/x86/hvm/hvm.c:

    case HVMOP_inject_trap:
    {
        xen_hvm_inject_trap_t tr;
        struct domain *d;
        struct vcpu *v;

        if ( copy_from_guest(&tr, arg, 1 ) )
            return -EFAULT;

        rc = rcu_lock_remote_domain_by_id(tr.domid, &d);
        if ( rc != 0 )
            return rc;

        rc = -EINVAL;
        if ( !is_hvm_domain(d) )
            goto param_fail8;

        rc = xsm_hvm_control(XSM_DM_PRIV, d, op);
        if ( rc )
            goto param_fail8;

        rc = -ENOENT;
        if ( tr.vcpuid >= d->max_vcpus || (v = d->vcpu[tr.vcpuid]) == NULL )
            goto param_fail8;

        if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
            rc = -EBUSY;
        else
        {
            v->arch.hvm_vcpu.inject_trap.vector = tr.vector;
            v->arch.hvm_vcpu.inject_trap.type = tr.type;
            v->arch.hvm_vcpu.inject_trap.error_code = tr.error_code;
            v->arch.hvm_vcpu.inject_trap.insn_len = tr.insn_len;
            v->arch.hvm_vcpu.inject_trap.cr2 = tr.cr2;
            rc = 0;
        }

    param_fail8:
        rcu_unlock_domain(d);
        break;
    }

The "v->arch.hvm_vcpu.inject_trap.<member> = initializer;" code again
points to a per-VCPU implementation.

And finally, hvm_do_resume() is also VCPU-dependent:

void hvm_do_resume(struct vcpu *v)
{
    struct domain *d = v->domain;
    struct hvm_ioreq_server *s;

    check_wakeup_from_wait();

    if ( is_hvm_vcpu(v) )
        pt_restore_timer(v);

    list_for_each_entry ( s,
                          &d->arch.hvm_domain.ioreq_server.list,
                          list_entry )
    {
        struct hvm_ioreq_vcpu *sv;

        list_for_each_entry ( sv,
                              &s->ioreq_vcpu_list,
                              list_entry )
        {
            if ( sv->vcpu == v )
            {
                if ( !hvm_wait_for_io(sv, get_ioreq(s, v)) )
                    return;

                break;
            }
        }
    }

    /* Inject pending hw/sw trap */
    if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
    {
        hvm_inject_trap(&v->arch.hvm_vcpu.inject_trap);
        v->arch.hvm_vcpu.inject_trap.vector = -1;
    }
}

While we need to set the data per-domain and have whatever VCPU inject
the page fault - _but_only_if_ it's in usermode and its CR3 points to
something interesting.

Hope I've been following the code correctly.


Thanks,
Razvan Cojocaru

next prev parent reply	other threads:[~2014-09-02  9:18 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-28 11:47 [PATCH RFC V9 1/5] xen: Emulate with no writes Razvan Cojocaru
2014-08-28 11:47 ` [PATCH RFC V9 2/5] xen: Optimize introspection access to guest state Razvan Cojocaru
2014-08-28 11:48 ` [PATCH RFC V9 3/5] xen, libxc: Force-enable relevant MSR events Razvan Cojocaru
2014-08-28 11:48 ` [PATCH RFC V9 4/5] xen, libxc: Request page fault injection via libxc Razvan Cojocaru
2014-08-28 12:03   ` Jan Beulich
2014-08-28 12:08     ` Razvan Cojocaru
2014-08-28 12:11       ` Jan Beulich
2014-08-28 12:23         ` Razvan Cojocaru
2014-08-28 12:37         ` Razvan Cojocaru
2014-08-29  7:44         ` Razvan Cojocaru
2014-08-29  9:27           ` Jan Beulich
2014-09-01  7:36             ` Razvan Cojocaru
2014-09-01  9:08               ` Jan Beulich
2014-09-01 11:54                 ` Razvan Cojocaru
2014-09-01 12:05                   ` Jan Beulich
2014-09-02  9:18                     ` Razvan Cojocaru [this message]
2014-09-02  9:33                       ` Jan Beulich
2014-09-02  9:44                         ` Razvan Cojocaru
2014-09-02 10:08                           ` Jan Beulich
2014-09-02 13:24                       ` Tim Deegan
2014-09-09 16:57                         ` George Dunlap
2014-09-09 17:39                           ` Razvan Cojocaru
2014-09-09 18:38                             ` Tamas K Lengyel
2014-09-10  8:09                               ` Razvan Cojocaru
2014-09-10  8:48                                 ` Andrew Cooper
2014-09-10  8:55                                   ` Razvan Cojocaru
2014-09-10  9:34                                     ` Andrew Cooper
2014-09-10 10:39                                     ` George Dunlap
2014-09-10 10:49                                       ` Razvan Cojocaru
2014-09-09 20:14                           ` Tim Deegan
2014-09-10  9:30                             ` Razvan Cojocaru
2014-09-10  9:59                               ` Tamas K Lengyel
2014-09-10 10:44                               ` Tim Deegan
2014-08-28 11:48 ` [PATCH RFC V9 5/5] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
2014-08-28 12:09   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54058B4E.9060001@bitdefender.com \
    --to=rcojocaru@bitdefender.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=eddie.dong@intel.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.