From: "Mihai Donțu" <mdontu@bitdefender.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>,
"Jan Kiszka" <jan.kiszka@siemens.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Adalbert Lazar" <alazar@bitdefender.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Tamas K Lengyel" <tamas.k.lengyel@gmail.com>,
"Andrei Vlad LUTAS" <vlutas@bitdefender.com>
Subject: Re: [RFC PATCH v2 1/1] kvm: Add documentation and ABI/API header for VM introspection
Date: Mon, 07 Aug 2017 19:44:03 +0300 [thread overview]
Message-ID: <1502124243.27693.210.camel@bitdefender.com> (raw)
In-Reply-To: <57ae89d0-f62c-c4c3-3122-7e3ed054dfd7@redhat.com>
On Mon, 2017-08-07 at 17:56 +0200, Paolo Bonzini wrote:
> On 07/08/2017 16:12, Mihai Donțu wrote:
> > On Mon, 2017-08-07 at 15:49 +0200, Paolo Bonzini wrote:
> > > On 07/08/2017 15:25, Mihai Donțu wrote:
> > > > > "Pause all VCPUs and stop all DMA" would definitely be a layering
> > > > > violation, so it cannot be added.
> > > > >
> > > > > "Pause all VCPUs" is basically a shortcut for many "pause the VCPU with
> > > > > a given id" commands. I lean towards omitting it.
> > > >
> > > > The case where the introspector wants to scan the guest memory needs a
> > > > KVMI_PAUSE_VM, which as discussed in a previous email, can be the
> > > > actual qemu 'pause' command.
> > >
> > > Do you mean it needs to stop DMA as well?
> >
> > No, DMA can proceed normally. I remain of the opinion that KVMI users
> > must know what guest memory ranges are OK to access by looking at MTRR-
> > s, PAT or guest kernel structures, or a combination of all three.
>
> Ok, good. Sorry if I am dense on the DMA/no-DMA cases.
I think it's OK to restate things, especially since my (our) view on
these matters might not match the KVM reality that you know far
better.
> (But, I don't understand your remark about guest memory ranges; the point of
> bus-master DMA is that it works on any memory, and cache snooping makes
> it even easier for hypothetical malware to do memory writes via
> bus-master DMA).
This is where I separated things in my head: I merely limited myself to
accessing memory, while leaving the reality of DMA-based attacks a
problem to be solved separately. There is some reasearch work being
tested internally on that, but I have yet to get in touch with the
people involved in it. As soon as I get some details maybe we can
connect something in KVM.
> > > > However, we would like to limit the
> > > > communication channels we have with the host and not use qmp (or
> > > > libvirt/etc. if qmp is not exposed). Instead, have a command that
> > > > triggers a KVM_RUN exit to qemu which in turn will call the underlying
> > > > pause function used by qmp. Would that be OK with you?
> > >
> > > You would have to send back something on completion, and then I am
> > > worried of races and deadlocks. Plus, pausing a VM at the QEMU level is
> > > a really expensive operation, so I don't think it's a good idea to let
> > > the introspector do this. You can pause all VCPUs, or use memory page
> > > permissions.
> >
> > Pausing all vCPU-s was my first thought, I was just trying to follow
> > your statement: "I lean towards omitting it". :-)
>
> Yes, and I still do because a hypothetical "pause all VCPUs" command
> still has the issue that you could get other events before the command
> completes. So I am not convinced that a specialized command actually
> makes the introspector code much simpler.
>
> I hope you understand that I want to keep the trusted base (not just the
> code I maintain, though that is a secondary benefit ;)) as simple as
> possible.
>
> > It will take a bit of user-space-fu, in that after issuing N vCPU pause
> > commands in a row we will have to wait for N events, which might race
> > with other events (MSR, CRx etc.) which need handling otherwise the
> > pause ones will not arrive
>
> The same issue would be there in QEMU or KVM though.
>
> If you can always request "pause all vCPUs" from an event handler,
> avoiding deadlocks is relatively easy. If you cannot ensure that, for
> example because of work that is scheduled periodically, you can send a
> KVM_PAUSE command to ensure the work is done in a safe condition.
>
> Then you get the following pseudocode algorithm:
>
> // a vCPU is not executing guest code, and it's going to check
> // num_pause_vm_requests before going back to guest code
> vcpu_not_running(id) {
> unmark vCPU "id" as running
> if (num vcpus running == 0)
> cond_broadcast(no_running_cpus)
> }
>
> pause_vcpu(id) {
> mark vCPU "id" as being-paused
> send KVMI_PAUSED for the vcpu
> }
>
> // return only when no vCPU is in KVM_RUN
> pause_vm() {
> if this vCPU is running
> if not in an event handler
> // caller should do pause_vcpu and defer the work
> return
>
> // we know this vCPU is not KVM_RUN
> vcpu_not_running()
>
> num_pause_vm_requests++
> if (num vcpus running > 0)
> for each vCPU that is running and not being-paused
> pause_vcpu(id)
> while (num vcpus running > 0)
> cond_wait(no_running_vcpus)
> }
>
> // tell paused vCPUs that they can resume
> resume_vm() {
> num_pause_vm_requests--
> if (num_pause_all_requests == 0)
> cond_broadcast(no_pending_pause_vm_requests)
> // either we're in an event handler, or a "pause" command was
> // sent for this vCPU. in any case we're guaranteed to do an
> // event_reply sooner or later, which will again mark the vCPU
> // as running
> }
>
> // after an event reply, the vCPU goes back to KVM_RUN. therefore
> // an event reply can act as a synchronization point for pause-vm
> // requests: delay the reply if there's such a request
> event_reply(id, data) {
> if (num_pause_vm_requests > 0) {
> if vCPU "id" is running
> vcpu_not_running(id)
> while (num_pause_vm_requests > 0)
> cond_wait(no_pending_pause_vm_requests)
> }
> mark vCPU "id" as running
> send event reply on KVMI socket
> }
>
> // this is what you do when KVM tells you that the guest is either
> // in userspace, or waiting to be woken up ("paused" event). from
> // the introspector POV the two are the same.
> vcpu_ack_pause(id) {
> vcpu_not_running(id)
> unmark vCPU "id" as being-paused
>
> // deferred work presumably calls pause_vm/resumve_vm, and this
> // vCPU is not running now, so this is a nice point to flush it
> if any deferred work exists, do it now
> }
>
> and on the KVMI read handler:
>
> on reply to "pause" command:
> if reply says the vCPU is currently in userspace
> // we'll get a KVMI_PAUSED event as soon as the host
> // reenters KVM with KVM_RUN, but we can already say the
> // CPU is not running
> vcpu_ack_pause()
>
> on "paused" event:
> vcpu_ack_pause()
> event_reply()
Thank you for this!
--
Mihai Donțu
next prev parent reply other threads:[~2017-08-07 16:44 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-07 14:34 [RFC PATCH v2 0/1] VM introspection Adalbert Lazar
2017-07-07 14:34 ` [RFC PATCH v2 1/1] kvm: Add documentation and ABI/API header for " Adalbert Lazar
2017-07-07 16:52 ` Paolo Bonzini
2017-07-10 15:32 ` alazar
2017-07-10 17:03 ` Paolo Bonzini
2017-07-11 16:48 ` Adalbert Lazar
2017-07-11 16:51 ` Paolo Bonzini
2017-07-13 5:57 ` Mihai Donțu
2017-07-13 7:32 ` Paolo Bonzini
2017-07-18 11:51 ` Mihai Donțu
2017-07-18 12:02 ` Mihai Donțu
2017-07-23 13:02 ` Paolo Bonzini
2017-07-26 17:04 ` Mihai Donțu
2017-07-26 17:25 ` Tamas K Lengyel
2017-07-27 14:41 ` Mihai Donțu
2017-07-27 13:33 ` Paolo Bonzini
2017-07-27 14:46 ` Mihai Donțu
2017-07-13 8:36 ` Mihai Donțu
2017-07-13 9:15 ` Paolo Bonzini
2017-07-27 16:23 ` Mihai Donțu
2017-07-27 16:52 ` Paolo Bonzini
2017-07-27 17:19 ` Mihai Donțu
2017-08-01 10:40 ` Paolo Bonzini
2017-08-01 16:33 ` Tamas K Lengyel
2017-08-01 20:47 ` Paolo Bonzini
2017-08-02 11:52 ` Mihai Donțu
2017-08-02 12:27 ` Paolo Bonzini
2017-08-02 13:32 ` Mihai Donțu
2017-08-02 13:51 ` Paolo Bonzini
2017-08-02 14:17 ` Mihai Donțu
2017-08-04 8:35 ` Paolo Bonzini
2017-08-04 15:29 ` Mihai Donțu
2017-08-04 15:37 ` Paolo Bonzini
2017-08-05 8:00 ` Andrei Vlad LUTAS
2017-08-07 12:18 ` Paolo Bonzini
2017-08-07 13:25 ` Mihai Donțu
2017-08-07 13:49 ` Paolo Bonzini
2017-08-07 14:12 ` Mihai Donțu
2017-08-07 15:56 ` Paolo Bonzini
2017-08-07 16:44 ` Mihai Donțu [this message]
2017-08-02 13:53 ` Mihai Donțu
2017-07-27 17:06 ` Mihai Donțu
2017-07-27 17:18 ` Paolo Bonzini
2017-07-07 17:29 ` [RFC PATCH v2 0/1] " Paolo Bonzini
2017-08-07 15:28 ` Mihai Donțu
2017-08-07 15:44 ` Paolo Bonzini
2017-07-12 14:09 ` Konrad Rzeszutek Wilk
2017-07-13 5:37 ` Mihai Donțu
2017-07-14 16:13 ` Konrad Rzeszutek Wilk
2017-07-18 8:55 ` Mihai Donțu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1502124243.27693.210.camel@bitdefender.com \
--to=mdontu@bitdefender.com \
--cc=alazar@bitdefender.com \
--cc=jan.kiszka@siemens.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=stefanha@redhat.com \
--cc=tamas.k.lengyel@gmail.com \
--cc=vlutas@bitdefender.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox