From: "Mihai Donțu" <mdontu@bitdefender.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>,
"Jan Kiszka" <jan.kiszka@siemens.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Adalbert Lazar" <alazar@bitdefender.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Tamas K Lengyel" <tamas.k.lengyel@gmail.com>,
"Andrei Vlad LUTAS" <vlutas@bitdefender.com>
Subject: Re: [RFC PATCH v2 1/1] kvm: Add documentation and ABI/API header for VM introspection
Date: Mon, 07 Aug 2017 19:44:03 +0300 [thread overview]
Message-ID: <1502124243.27693.210.camel@bitdefender.com> (raw)
In-Reply-To: <57ae89d0-f62c-c4c3-3122-7e3ed054dfd7@redhat.com>
On Mon, 2017-08-07 at 17:56 +0200, Paolo Bonzini wrote:
> On 07/08/2017 16:12, Mihai Donțu wrote:
> > On Mon, 2017-08-07 at 15:49 +0200, Paolo Bonzini wrote:
> > > On 07/08/2017 15:25, Mihai Donțu wrote:
> > > > > "Pause all VCPUs and stop all DMA" would definitely be a layering
> > > > > violation, so it cannot be added.
> > > > >
> > > > > "Pause all VCPUs" is basically a shortcut for many "pause the VCPU with
> > > > > a given id" commands. I lean towards omitting it.
> > > >
> > > > The case where the introspector wants to scan the guest memory needs a
> > > > KVMI_PAUSE_VM, which as discussed in a previous email, can be the
> > > > actual qemu 'pause' command.
> > >
> > > Do you mean it needs to stop DMA as well?
> >
> > No, DMA can proceed normally. I remain of the opinion that KVMI users
> > must know what guest memory ranges are OK to access by looking at MTRR-
> > s, PAT or guest kernel structures, or a combination of all three.
>
> Ok, good. Sorry if I am dense on the DMA/no-DMA cases.
I think it's OK to restate things, especially since my (our) view on
these matters might not match the KVM reality that you know far
better.
> (But, I don't understand your remark about guest memory ranges; the point of
> bus-master DMA is that it works on any memory, and cache snooping makes
> it even easier for hypothetical malware to do memory writes via
> bus-master DMA).
This is where I separated things in my head: I merely limited myself to
accessing memory, while leaving the reality of DMA-based attacks a
problem to be solved separately. There is some reasearch work being
tested internally on that, but I have yet to get in touch with the
people involved in it. As soon as I get some details maybe we can
connect something in KVM.
> > > > However, we would like to limit the
> > > > communication channels we have with the host and not use qmp (or
> > > > libvirt/etc. if qmp is not exposed). Instead, have a command that
> > > > triggers a KVM_RUN exit to qemu which in turn will call the underlying
> > > > pause function used by qmp. Would that be OK with you?
> > >
> > > You would have to send back something on completion, and then I am
> > > worried of races and deadlocks. Plus, pausing a VM at the QEMU level is
> > > a really expensive operation, so I don't think it's a good idea to let
> > > the introspector do this. You can pause all VCPUs, or use memory page
> > > permissions.
> >
> > Pausing all vCPU-s was my first thought, I was just trying to follow
> > your statement: "I lean towards omitting it". :-)
>
> Yes, and I still do because a hypothetical "pause all VCPUs" command
> still has the issue that you could get other events before the command
> completes. So I am not convinced that a specialized command actually
> makes the introspector code much simpler.
>
> I hope you understand that I want to keep the trusted base (not just the
> code I maintain, though that is a secondary benefit ;)) as simple as
> possible.
>
> > It will take a bit of user-space-fu, in that after issuing N vCPU pause
> > commands in a row we will have to wait for N events, which might race
> > with other events (MSR, CRx etc.) which need handling otherwise the
> > pause ones will not arrive
>
> The same issue would be there in QEMU or KVM though.
>
> If you can always request "pause all vCPUs" from an event handler,
> avoiding deadlocks is relatively easy. If you cannot ensure that, for
> example because of work that is scheduled periodically, you can send a
> KVM_PAUSE command to ensure the work is done in a safe condition.
>
> Then you get the following pseudocode algorithm:
>
> // a vCPU is not executing guest code, and it's going to check
> // num_pause_vm_requests before going back to guest code
> vcpu_not_running(id) {
> unmark vCPU "id" as running
> if (num vcpus running == 0)
> cond_broadcast(no_running_cpus)
> }
>
> pause_vcpu(id) {
> mark vCPU "id" as being-paused
> send KVMI_PAUSED for the vcpu
> }
>
> // return only when no vCPU is in KVM_RUN
> pause_vm() {
> if this vCPU is running
> if not in an event handler
> // caller should do pause_vcpu and defer the work
> return
>
> // we know this vCPU is not KVM_RUN
> vcpu_not_running()
>
> num_pause_vm_requests++
> if (num vcpus running > 0)
> for each vCPU that is running and not being-paused
> pause_vcpu(id)
> while (num vcpus running > 0)
> cond_wait(no_running_vcpus)
> }
>
> // tell paused vCPUs that they can resume
> resume_vm() {
> num_pause_vm_requests--
> if (num_pause_all_requests == 0)
> cond_broadcast(no_pending_pause_vm_requests)
> // either we're in an event handler, or a "pause" command was
> // sent for this vCPU. in any case we're guaranteed to do an
> // event_reply sooner or later, which will again mark the vCPU
> // as running
> }
>
> // after an event reply, the vCPU goes back to KVM_RUN. therefore
> // an event reply can act as a synchronization point for pause-vm
> // requests: delay the reply if there's such a request
> event_reply(id, data) {
> if (num_pause_vm_requests > 0) {
> if vCPU "id" is running
> vcpu_not_running(id)
> while (num_pause_vm_requests > 0)
> cond_wait(no_pending_pause_vm_requests)
> }
> mark vCPU "id" as running
> send event reply on KVMI socket
> }
>
> // this is what you do when KVM tells you that the guest is either
> // in userspace, or waiting to be woken up ("paused" event). from
> // the introspector POV the two are the same.
> vcpu_ack_pause(id) {
> vcpu_not_running(id)
> unmark vCPU "id" as being-paused
>
> // deferred work presumably calls pause_vm/resumve_vm, and this
> // vCPU is not running now, so this is a nice point to flush it
> if any deferred work exists, do it now
> }
>
> and on the KVMI read handler:
>
> on reply to "pause" command:
> if reply says the vCPU is currently in userspace
> // we'll get a KVMI_PAUSED event as soon as the host
> // reenters KVM with KVM_RUN, but we can already say the
> // CPU is not running
> vcpu_ack_pause()
>
> on "paused" event:
> vcpu_ack_pause()
> event_reply()
Thank you for this!
--
Mihai Donțu
next prev parent reply other threads:[~2017-08-07 16:44 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-07 14:34 [RFC PATCH v2 0/1] VM introspection Adalbert Lazar
2017-07-07 14:34 ` [RFC PATCH v2 1/1] kvm: Add documentation and ABI/API header for " Adalbert Lazar
2017-07-07 16:52 ` Paolo Bonzini
2017-07-10 15:32 ` alazar
2017-07-10 17:03 ` Paolo Bonzini
2017-07-11 16:48 ` Adalbert Lazar
2017-07-11 16:51 ` Paolo Bonzini
2017-07-13 5:57 ` Mihai Donțu
2017-07-13 7:32 ` Paolo Bonzini
2017-07-18 11:51 ` Mihai Donțu
2017-07-18 12:02 ` Mihai Donțu
2017-07-23 13:02 ` Paolo Bonzini
2017-07-26 17:04 ` Mihai Donțu
2017-07-26 17:25 ` Tamas K Lengyel
2017-07-27 14:41 ` Mihai Donțu
2017-07-27 13:33 ` Paolo Bonzini
2017-07-27 14:46 ` Mihai Donțu
2017-07-13 8:36 ` Mihai Donțu
2017-07-13 9:15 ` Paolo Bonzini
2017-07-27 16:23 ` Mihai Donțu
2017-07-27 16:52 ` Paolo Bonzini
2017-07-27 17:19 ` Mihai Donțu
2017-08-01 10:40 ` Paolo Bonzini
2017-08-01 16:33 ` Tamas K Lengyel
2017-08-01 20:47 ` Paolo Bonzini
2017-08-02 11:52 ` Mihai Donțu
2017-08-02 12:27 ` Paolo Bonzini
2017-08-02 13:32 ` Mihai Donțu
2017-08-02 13:51 ` Paolo Bonzini
2017-08-02 14:17 ` Mihai Donțu
2017-08-04 8:35 ` Paolo Bonzini
2017-08-04 15:29 ` Mihai Donțu
2017-08-04 15:37 ` Paolo Bonzini
2017-08-05 8:00 ` Andrei Vlad LUTAS
2017-08-07 12:18 ` Paolo Bonzini
2017-08-07 13:25 ` Mihai Donțu
2017-08-07 13:49 ` Paolo Bonzini
2017-08-07 14:12 ` Mihai Donțu
2017-08-07 15:56 ` Paolo Bonzini
2017-08-07 16:44 ` Mihai Donțu [this message]
2017-08-02 13:53 ` Mihai Donțu
2017-07-27 17:06 ` Mihai Donțu
2017-07-27 17:18 ` Paolo Bonzini
2017-07-07 17:29 ` [RFC PATCH v2 0/1] " Paolo Bonzini
2017-08-07 15:28 ` Mihai Donțu
2017-08-07 15:44 ` Paolo Bonzini
2017-07-12 14:09 ` Konrad Rzeszutek Wilk
2017-07-13 5:37 ` Mihai Donțu
2017-07-14 16:13 ` Konrad Rzeszutek Wilk
2017-07-18 8:55 ` Mihai Donțu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1502124243.27693.210.camel@bitdefender.com \
--to=mdontu@bitdefender.com \
--cc=alazar@bitdefender.com \
--cc=jan.kiszka@siemens.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=stefanha@redhat.com \
--cc=tamas.k.lengyel@gmail.com \
--cc=vlutas@bitdefender.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.