From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:51545) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TsHIc-0002mD-WF for qemu-devel@nongnu.org; Mon, 07 Jan 2013 13:19:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TsHIb-0007Vf-4Q for qemu-devel@nongnu.org; Mon, 07 Jan 2013 13:19:50 -0500 Received: from e9.ny.us.ibm.com ([32.97.182.139]:59332) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TsHIb-0007Vb-0T for qemu-devel@nongnu.org; Mon, 07 Jan 2013 13:19:49 -0500 Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 7 Jan 2013 13:19:48 -0500 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id AD2DF38C801C for ; Mon, 7 Jan 2013 13:19:46 -0500 (EST) Received: from d03av05.boulder.ibm.com (d03av05.boulder.ibm.com [9.17.195.85]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r07IJjKH353626 for ; Mon, 7 Jan 2013 13:19:46 -0500 Received: from d03av05.boulder.ibm.com (loopback [127.0.0.1]) by d03av05.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r07IJfYS028080 for ; Mon, 7 Jan 2013 11:19:42 -0700 Message-ID: <50EB11B3.50505@linux.vnet.ibm.com> Date: Mon, 07 Jan 2013 13:19:31 -0500 From: "Jason J. Herne" MIME-Version: 1.0 References: <1356098191-4998-1-git-send-email-jjherne@us.ibm.com> <133FEF92-3C4F-48C8-BF67-E50066EEEF45@suse.de> <50E5D29B.6060804@linux.vnet.ibm.com> <20130104013812.GB23746@amt.cnet> <6A3DF150A5B70D4F9B66A25E3F7C888D06542905@039-SN2MPN1-022.039d.mgd.msft.net> <292DDE3D-7B6F-400E-954B-49CA3E284FDB@suse.de> <6A3DF150A5B70D4F9B66A25E3F7C888D06542EA8@039-SN2MPN1-022.039d.mgd.msft.net> <6A3DF150A5B70D4F9B66A25E3F7C888D06542EF7@039-SN2MPN1-022.039d.mgd.msft.net> <9B2CB541-8806-4BDF-A523-FD597BDFA08B@suse.de> <6A3DF150A5B70D4F9B66A25E3F7C888D06542FCA@039-SN2MPN1-022.039d.mgd.msft.net> <6501413C-7526-42DB-8824-C0638F59985A@suse.de> <50E6F479.4090002@linux.vnet.ibm.com> <50EAED06.5050501@linux.vnet.ibm.com> <984A23BA-E5A3-4CE5-A018-0D4C9940652B@suse.de> In-Reply-To: <984A23BA-E5A3-4CE5-A018-0D4C9940652B@suse.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue Reply-To: jjherne@linux.vnet.ibm.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Graf Cc: Christian Borntraeger , Anthony Liguori , Marcelo Tosatti , "qemu-devel@nongnu.org qemu-devel" , Bhushan Bharat-R65777 On 01/07/2013 10:49 AM, Alexander Graf wrote: > > On 07.01.2013, at 16:43, Jason J. Herne wrote: > >> On 01/04/2013 11:27 AM, Alexander Graf wrote: >>> >>> On 04.01.2013, at 16:25, Jason J. Herne wrote: >>> >>>> If I've followed the conversation correctly this is what needs to be done: >>>> >>>> 1. Remove the level parameters from kvm_arch_get_registers and kvm_arch_put_registers. >>>> >>>> 2. Add a new bitmap parameter to kvm_arch_get_registers and kvm_arch_put_registers. >>> >>> I would combine these into "replace levels with bitmap". >>> >>>> 3. Define a bit that correlates to our current notion of "all runtime registers". This bit, and all bits in this bitmap, would be architecture specific. >>> >>> Why would that bit be architecture specific? "All runtime registers" == "registers that gdb can access" IIRC. The implementation on what exactly that means obviously is architecture specific, but the bit itself would not be, as the gdbstub wants to be able to synchronize in arch independent code. >>> >> >> How do we want to define these bits? is it logical to break up the registers into smaller categories and then use masks to create RUNTIME_STATE, FULL_STATE, RESET_STATE? If so, how should we define them? Would they be arch specific and then we'd create the _STATE masks for each architecture? > > I see. So you only want to make the name arch independent, but keep its actual backing bits arch specific. I can see how that'd end up being a useful thing to do, yes. > > So we could have archs that just define RUNTIME_STATE as ARCH_RUNTIME_STATE and others that define it as ARCH_STATE_REGx | ARCH_STATE_REGy. That way other code may only synchronize less than the full runtime state. Works for me :). > >> If we do simply define a bit for each of the above three states instead, they should probably be 100% mutually exclusive to provide the best protection against complicated data synchronization issues (like the original 7/7 patch was trying to prevent). Also, if we can assume 100% mutual exclusion the sync logic becomes trivial: >> >> static void do_kvm_cpu_synchronize_state(void *arg) >> { >> struct kvm_cpu_syncstate_args *args = arg; >> >> /* Do not sync regs that are already dirty */ >> int regs_to_get = args->regmap & ~cpu->kvm_vcpu_dirty; >> >> kvm_arch_get_registers(args->cpu, regs_to_get); >> args->cpu->kvm_vcpu_dirty |= regs_to_get; >> } >> >> Thoughts? > > I like the idea of making the bits 100% mutually exclusive. > I've started writing the code to replace the kvm_arch_put_register level parameter with a register bitmap and I'm hitting some problems with respect to the Intel/PPC targets: 1. target-i386/kvm,c : kvm_arch_put_registers() : This function syncs many registers "all the time". I'm not entirely convinced from reading the code that I can easily group these registers into the default mutually exclusive groupings (runtime, reset, full). Some of the comments seem to imply an ordering. Also, a massive data structure is prepared and then a single IOCTL call is used to do the register sync. So I'm not sure if the IOCTL will expect to always see certain registers. Things get messier when considering the msr's in kvm_put_msrs(). 2. Currently no architecture has code to selectively GET register state (hence the initial patch set). If we do not fully implement this now for all KVM targets then the selective syncing implemented in do_kvm_cpu_synchronize_state() will fail. Consider the case where we sync the reset group of registers and make some of them dirty. If a separate task syncs the full state at this point the reset regs will get pulled down and local changes will be lost due to kvm_arch_get_registers. Sure we could hack around it but we would just be reverting to a "level" based system on top of our bitmap. It looks like this is going to be a decent amount of work. I do not believe I have the platform specific knowledge (nor would I have the time) to make the changes required to the x86 and PPC platform specific code. Perhaps others might volunteer if this is worth the effort :)? Else, perhaps we should examine a simpler solution to the problem? Any chance the originally submitted patch set would suffice? -- -- Jason J. Herne (jjherne@linux.vnet.ibm.com)