From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mihai =?UTF-8?Q?Don=C8=9Bu?= Subject: Re: [RFC PATCH v2 1/1] kvm: Add documentation and ABI/API header for VM introspection Date: Mon, 07 Aug 2017 19:44:03 +0300 Message-ID: <1502124243.27693.210.camel@bitdefender.com> References: <20170707143416.11195-1-alazar@bitdefender.com> <1d3e3fc7-5fec-037e-4be4-82a380c85972@redhat.com> <1501172635.8856.4.camel@bitdefender.com> <5f499fe6-0ac8-56e7-a4f5-ba6809cc7c6a@redhat.com> <1501175973.8856.11.camel@bitdefender.com> <1501674729.15747.282.camel@bitdefender.com> <1b3467e4-1d67-dacd-7436-6a07c08f597b@redhat.com> <1501680749.15747.319.camel@bitdefender.com> <1501683449.15747.334.camel@bitdefender.com> <3e9ee026-260f-6a47-8482-d9daaac98d5a@redhat.com> <1501860597.27693.28.camel@bitdefender.com> <7e06849b-5c2d-9dc2-46b5-9a883750f488@redhat.com> <1502112334.27693.141.camel@bitdefender.com> <9e6a9400-6ce7-9bfc-efef-338eb5e85d99@redhat.com> <1502115132.27693.153.camel@bitdefender.com> <57ae89d0-f62c-c4c3-3122-7e3ed054dfd7@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Cc: Radim =?UTF-8?Q?Kr=C4=8Dm=C3=A1=C5=99?= , Jan Kiszka , Stefan Hajnoczi , Adalbert Lazar , "kvm@vger.kernel.org" , Tamas K Lengyel , Andrei Vlad LUTAS To: Paolo Bonzini Return-path: Received: from mx01.bbu.dsd.mx.bitdefender.com ([91.199.104.161]:48796 "EHLO mx01.bbu.dsd.mx.bitdefender.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751864AbdHGQoF (ORCPT ); Mon, 7 Aug 2017 12:44:05 -0400 Received: from smtp03.buh.bitdefender.org (smtp.bitdefender.biz [10.17.80.77]) by mx-sr.buh.bitdefender.com (Postfix) with ESMTP id 0B4997FC40 for ; Mon, 7 Aug 2017 19:44:04 +0300 (EEST) In-Reply-To: <57ae89d0-f62c-c4c3-3122-7e3ed054dfd7@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, 2017-08-07 at 17:56 +0200, Paolo Bonzini wrote: > On 07/08/2017 16:12, Mihai Donțu wrote: > > On Mon, 2017-08-07 at 15:49 +0200, Paolo Bonzini wrote: > > > On 07/08/2017 15:25, Mihai Donțu wrote: > > > > > "Pause all VCPUs and stop all DMA" would definitely be a layering > > > > > violation, so it cannot be added. > > > > > > > > > > "Pause all VCPUs" is basically a shortcut for many "pause the VCPU with > > > > > a given id" commands.  I lean towards omitting it. > > > > > > > > The case where the introspector wants to scan the guest memory needs a > > > > KVMI_PAUSE_VM, which as discussed in a previous email, can be the > > > > actual qemu 'pause' command. > > > > > > Do you mean it needs to stop DMA as well? > > > > No, DMA can proceed normally. I remain of the opinion that KVMI users > > must know what guest memory ranges are OK to access by looking at MTRR- > > s, PAT or guest kernel structures, or a combination of all three. > > Ok, good.  Sorry if I am dense on the DMA/no-DMA cases. I think it's OK to restate things, especially since my (our) view on these matters might not match the KVM reality that you know far better. > (But, I don't understand your remark about guest memory ranges; the point of > bus-master DMA is that it works on any memory, and cache snooping makes > it even easier for hypothetical malware to do memory writes via > bus-master DMA). This is where I separated things in my head: I merely limited myself to accessing memory, while leaving the reality of DMA-based attacks a problem to be solved separately. There is some reasearch work being tested internally on that, but I have yet to get in touch with the people involved in it. As soon as I get some details maybe we can connect something in KVM. > > > > However, we would like to limit the > > > > communication channels we have with the host and not use qmp (or > > > > libvirt/etc. if qmp is not exposed). Instead, have a command that > > > > triggers a KVM_RUN exit to qemu which in turn will call the underlying > > > > pause function used by qmp. Would that be OK with you? > > > > > > You would have to send back something on completion, and then I am > > > worried of races and deadlocks.  Plus, pausing a VM at the QEMU level is > > > a really expensive operation, so I don't think it's a good idea to let > > > the introspector do this.  You can pause all VCPUs, or use memory page > > > permissions. > > > > Pausing all vCPU-s was my first thought, I was just trying to follow > > your statement: "I lean towards omitting it". :-) > > Yes, and I still do because a hypothetical "pause all VCPUs" command > still has the issue that you could get other events before the command > completes.  So I am not convinced that a specialized command actually > makes the introspector code much simpler. > > I hope you understand that I want to keep the trusted base (not just the > code I maintain, though that is a secondary benefit ;)) as simple as > possible. > > > It will take a bit of user-space-fu, in that after issuing N vCPU pause > > commands in a row we will have to wait for N events, which might race > > with other events (MSR, CRx etc.) which need handling otherwise the > > pause ones will not arrive > > The same issue would be there in QEMU or KVM though. > > If you can always request "pause all vCPUs" from an event handler, > avoiding deadlocks is relatively easy.  If you cannot ensure that, for > example because of work that is scheduled periodically, you can send a > KVM_PAUSE command to ensure the work is done in a safe condition. > > Then you get the following pseudocode algorithm: > >     // a vCPU is not executing guest code, and it's going to check >     // num_pause_vm_requests before going back to guest code >     vcpu_not_running(id) { >         unmark vCPU "id" as running >         if (num vcpus running == 0) >             cond_broadcast(no_running_cpus) >     } > >     pause_vcpu(id) { >         mark vCPU "id" as being-paused >         send KVMI_PAUSED for the vcpu >     } > >     // return only when no vCPU is in KVM_RUN >     pause_vm() { >         if this vCPU is running >             if not in an event handler >                 // caller should do pause_vcpu and defer the work >                 return > >             // we know this vCPU is not KVM_RUN >             vcpu_not_running() > >         num_pause_vm_requests++ >         if (num vcpus running > 0) >             for each vCPU that is running and not being-paused >                 pause_vcpu(id) >             while (num vcpus running > 0) >                 cond_wait(no_running_vcpus) >     } > >     // tell paused vCPUs that they can resume >     resume_vm() { >         num_pause_vm_requests-- >         if (num_pause_all_requests == 0) >             cond_broadcast(no_pending_pause_vm_requests) >         // either we're in an event handler, or a "pause" command was >         // sent for this vCPU.  in any case we're guaranteed to do an >         // event_reply sooner or later, which will again mark the vCPU >         // as running >     } > >     // after an event reply, the vCPU goes back to KVM_RUN.  therefore >     // an event reply can act as a synchronization point for pause-vm >     // requests: delay the reply if there's such a request >     event_reply(id, data) { >         if (num_pause_vm_requests > 0) { >             if vCPU "id" is running >                 vcpu_not_running(id) >             while (num_pause_vm_requests > 0) >                 cond_wait(no_pending_pause_vm_requests) >         } >         mark vCPU "id" as running >         send event reply on KVMI socket >     } > >     // this is what you do when KVM tells you that the guest is either >     // in userspace, or waiting to be woken up ("paused" event).  from >     // the introspector POV the two are the same. >     vcpu_ack_pause(id) { >         vcpu_not_running(id) >         unmark vCPU "id" as being-paused > >         // deferred work presumably calls pause_vm/resumve_vm, and this >         // vCPU is not running now, so this is a nice point to flush it >         if any deferred work exists, do it now >     } > > and on the KVMI read handler: > >     on reply to "pause" command: >         if reply says the vCPU is currently in userspace >             // we'll get a KVMI_PAUSED event as soon as the host >             // reenters KVM with KVM_RUN, but we can already say the >             // CPU is not running >             vcpu_ack_pause() > >     on "paused" event: >         vcpu_ack_pause() >         event_reply() Thank you for this! -- Mihai Donțu