Re: [RFC PATCH v2 1/1] kvm: Add documentation and ABI/API header for VM introspection

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: "Mihai Donțu" <mdontu@bitdefender.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>,
	"Jan Kiszka" <jan.kiszka@siemens.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Adalbert Lazar" <alazar@bitdefender.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Tamas K Lengyel" <tamas.k.lengyel@gmail.com>,
	"Andrei Vlad LUTAS" <vlutas@bitdefender.com>
Subject: Re: [RFC PATCH v2 1/1] kvm: Add documentation and ABI/API header for VM introspection
Date: Mon, 07 Aug 2017 19:44:03 +0300	[thread overview]
Message-ID: <1502124243.27693.210.camel@bitdefender.com> (raw)
In-Reply-To: <57ae89d0-f62c-c4c3-3122-7e3ed054dfd7@redhat.com>

On Mon, 2017-08-07 at 17:56 +0200, Paolo Bonzini wrote:
> On 07/08/2017 16:12, Mihai Donțu wrote:
> > On Mon, 2017-08-07 at 15:49 +0200, Paolo Bonzini wrote:
> > > On 07/08/2017 15:25, Mihai Donțu wrote:
> > > > > "Pause all VCPUs and stop all DMA" would definitely be a layering
> > > > > violation, so it cannot be added.
> > > > > 
> > > > > "Pause all VCPUs" is basically a shortcut for many "pause the VCPU with
> > > > > a given id" commands.  I lean towards omitting it.
> > > > 
> > > > The case where the introspector wants to scan the guest memory needs a
> > > > KVMI_PAUSE_VM, which as discussed in a previous email, can be the
> > > > actual qemu 'pause' command.
> > > 
> > > Do you mean it needs to stop DMA as well?
> > 
> > No, DMA can proceed normally. I remain of the opinion that KVMI users
> > must know what guest memory ranges are OK to access by looking at MTRR-
> > s, PAT or guest kernel structures, or a combination of all three.
> 
> Ok, good.  Sorry if I am dense on the DMA/no-DMA cases.

I think it's OK to restate things, especially since my (our) view on
these matters might not match the KVM reality that you know far
better.

> (But, I don't understand your remark about guest memory ranges; the point of
> bus-master DMA is that it works on any memory, and cache snooping makes
> it even easier for hypothetical malware to do memory writes via
> bus-master DMA).

This is where I separated things in my head: I merely limited myself to
accessing memory, while leaving the reality of DMA-based attacks a
problem to be solved separately. There is some reasearch work being
tested internally on that, but I have yet to get in touch with the
people involved in it. As soon as I get some details maybe we can
connect something in KVM.

> > > > However, we would like to limit the
> > > > communication channels we have with the host and not use qmp (or
> > > > libvirt/etc. if qmp is not exposed). Instead, have a command that
> > > > triggers a KVM_RUN exit to qemu which in turn will call the underlying
> > > > pause function used by qmp. Would that be OK with you?
> > > 
> > > You would have to send back something on completion, and then I am
> > > worried of races and deadlocks.  Plus, pausing a VM at the QEMU level is
> > > a really expensive operation, so I don't think it's a good idea to let
> > > the introspector do this.  You can pause all VCPUs, or use memory page
> > > permissions.
> > 
> > Pausing all vCPU-s was my first thought, I was just trying to follow
> > your statement: "I lean towards omitting it". :-)
> 
> Yes, and I still do because a hypothetical "pause all VCPUs" command
> still has the issue that you could get other events before the command
> completes.  So I am not convinced that a specialized command actually
> makes the introspector code much simpler.
> 
> I hope you understand that I want to keep the trusted base (not just the
> code I maintain, though that is a secondary benefit ;)) as simple as
> possible.
> 
> > It will take a bit of user-space-fu, in that after issuing N vCPU pause
> > commands in a row we will have to wait for N events, which might race
> > with other events (MSR, CRx etc.) which need handling otherwise the
> > pause ones will not arrive
> 
> The same issue would be there in QEMU or KVM though.
> 
> If you can always request "pause all vCPUs" from an event handler,
> avoiding deadlocks is relatively easy.  If you cannot ensure that, for
> example because of work that is scheduled periodically, you can send a
> KVM_PAUSE command to ensure the work is done in a safe condition.
> 
> Then you get the following pseudocode algorithm:
> 
>     // a vCPU is not executing guest code, and it's going to check
>     // num_pause_vm_requests before going back to guest code
>     vcpu_not_running(id) {
>         unmark vCPU "id" as running
>         if (num vcpus running == 0)
>             cond_broadcast(no_running_cpus)
>     }
> 
>     pause_vcpu(id) {
>         mark vCPU "id" as being-paused
>         send KVMI_PAUSED for the vcpu
>     }
> 
>     // return only when no vCPU is in KVM_RUN
>     pause_vm() {
>         if this vCPU is running
>             if not in an event handler
>                 // caller should do pause_vcpu and defer the work
>                 return
> 
>             // we know this vCPU is not KVM_RUN
>             vcpu_not_running()
> 
>         num_pause_vm_requests++
>         if (num vcpus running > 0)
>             for each vCPU that is running and not being-paused
>                 pause_vcpu(id)
>             while (num vcpus running > 0)
>                 cond_wait(no_running_vcpus)
>     }
> 
>     // tell paused vCPUs that they can resume
>     resume_vm() {
>         num_pause_vm_requests--
>         if (num_pause_all_requests == 0)
>             cond_broadcast(no_pending_pause_vm_requests)
>         // either we're in an event handler, or a "pause" command was
>         // sent for this vCPU.  in any case we're guaranteed to do an
>         // event_reply sooner or later, which will again mark the vCPU
>         // as running
>     }
> 
>     // after an event reply, the vCPU goes back to KVM_RUN.  therefore
>     // an event reply can act as a synchronization point for pause-vm
>     // requests: delay the reply if there's such a request
>     event_reply(id, data) {
>         if (num_pause_vm_requests > 0) {
>             if vCPU "id" is running
>                 vcpu_not_running(id)
>             while (num_pause_vm_requests > 0)
>                 cond_wait(no_pending_pause_vm_requests)
>         }
>         mark vCPU "id" as running
>         send event reply on KVMI socket
>     }
> 
>     // this is what you do when KVM tells you that the guest is either
>     // in userspace, or waiting to be woken up ("paused" event).  from
>     // the introspector POV the two are the same.
>     vcpu_ack_pause(id) {
>         vcpu_not_running(id)
>         unmark vCPU "id" as being-paused
> 
>         // deferred work presumably calls pause_vm/resumve_vm, and this
>         // vCPU is not running now, so this is a nice point to flush it
>         if any deferred work exists, do it now
>     }
> 
> and on the KVMI read handler:
> 
>     on reply to "pause" command:
>         if reply says the vCPU is currently in userspace
>             // we'll get a KVMI_PAUSED event as soon as the host
>             // reenters KVM with KVM_RUN, but we can already say the
>             // CPU is not running
>             vcpu_ack_pause()
> 
>     on "paused" event:
>         vcpu_ack_pause()
>         event_reply()

Thank you for this!

-- 
Mihai Donțu

next prev parent reply	other threads:[~2017-08-07 16:44 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-07 14:34 [RFC PATCH v2 0/1] VM introspection Adalbert Lazar
2017-07-07 14:34 ` [RFC PATCH v2 1/1] kvm: Add documentation and ABI/API header for " Adalbert Lazar
2017-07-07 16:52   ` Paolo Bonzini
2017-07-10 15:32     ` alazar
2017-07-10 17:03       ` Paolo Bonzini
2017-07-11 16:48         ` Adalbert Lazar
2017-07-11 16:51           ` Paolo Bonzini
2017-07-13  5:57             ` Mihai Donțu
2017-07-13  7:32               ` Paolo Bonzini
2017-07-18 11:51                 ` Mihai Donțu
2017-07-18 12:02                   ` Mihai Donțu
2017-07-23 13:02                   ` Paolo Bonzini
2017-07-26 17:04                     ` Mihai Donțu
2017-07-26 17:25                       ` Tamas K Lengyel
2017-07-27 14:41                         ` Mihai Donțu
2017-07-27 13:33                       ` Paolo Bonzini
2017-07-27 14:46                         ` Mihai Donțu
2017-07-13  8:36     ` Mihai Donțu
2017-07-13  9:15       ` Paolo Bonzini
2017-07-27 16:23         ` Mihai Donțu
2017-07-27 16:52           ` Paolo Bonzini
2017-07-27 17:19             ` Mihai Donțu
2017-08-01 10:40               ` Paolo Bonzini
2017-08-01 16:33                 ` Tamas K Lengyel
2017-08-01 20:47                   ` Paolo Bonzini
2017-08-02 11:52                     ` Mihai Donțu
2017-08-02 12:27                       ` Paolo Bonzini
2017-08-02 13:32                         ` Mihai Donțu
2017-08-02 13:51                           ` Paolo Bonzini
2017-08-02 14:17                             ` Mihai Donțu
2017-08-04  8:35                               ` Paolo Bonzini
2017-08-04 15:29                                 ` Mihai Donțu
2017-08-04 15:37                                   ` Paolo Bonzini
2017-08-05  8:00                                   ` Andrei Vlad LUTAS
2017-08-07 12:18                                     ` Paolo Bonzini
2017-08-07 13:25                                       ` Mihai Donțu
2017-08-07 13:49                                         ` Paolo Bonzini
2017-08-07 14:12                                           ` Mihai Donțu
2017-08-07 15:56                                             ` Paolo Bonzini
2017-08-07 16:44                                               ` Mihai Donțu [this message]
2017-08-02 13:53                           ` Mihai Donțu
2017-07-27 17:06     ` Mihai Donțu
2017-07-27 17:18       ` Paolo Bonzini
2017-07-07 17:29 ` [RFC PATCH v2 0/1] " Paolo Bonzini
2017-08-07 15:28   ` Mihai Donțu
2017-08-07 15:44     ` Paolo Bonzini
2017-07-12 14:09 ` Konrad Rzeszutek Wilk
2017-07-13  5:37   ` Mihai Donțu
2017-07-14 16:13     ` Konrad Rzeszutek Wilk
2017-07-18  8:55       ` Mihai Donțu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1502124243.27693.210.camel@bitdefender.com \
    --to=mdontu@bitdefender.com \
    --cc=alazar@bitdefender.com \
    --cc=jan.kiszka@siemens.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=tamas.k.lengyel@gmail.com \
    --cc=vlutas@bitdefender.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox