From mboxrd@z Thu Jan 1 00:00:00 1970 From: Harry Butterworth Subject: Re: Is continuous replication of state possible? Date: Sun, 09 Jan 2005 02:00:45 +0000 Message-ID: <1105236045.3134.66.camel@localhost> References: <20050107100119.GW8251@cl.cam.ac.uk> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: xen-devel-admin@lists.sourceforge.net Errors-To: xen-devel-admin@lists.sourceforge.net List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , List-Archive: To: Ian.Pratt@cl.cam.ac.uk Cc: xen-devel@lists.sourceforge.net List-Id: xen-devel@lists.xenproject.org Ian Pratt wrote: > Software-implemented hardware fault-tolerance is on the Xen > research roadmap. > > It basically just requires deterministic execution and event > injection. Doing this for uniprocessor guests is fairly straight > forward. Doing it for SMP guests (with decent performance) is > going to be a huge challenge, as determinism is hard to achieve. We're > looking in to it... I thought a little about trying to implement a byzantine fault tolerant SMP virtual machine using state replication. I read Barbara Liskov's group's work for some BFT distributed consensus protocols and there is some good work out there on BFT routing in ad-hoc networks and a little bit on BFT authentication but I didn't find much on the SMP issues. I wanted to try out two approaches: 1) Putting each CPU into a separate BFT context and suffering the consensus protocol overhead in inter-cpu communications. I thought this would probably work OK for replicated servers which were relatively near each other (for example on IBM's proposed CIB hardware) but I was also interested in global replication for continuity through disasters so I was considering 2) Assuming a standard execution rate for instructions and speculatively executing them non-deterministically in parallel. Non-determinism is only significant when communication between CPUs happens in the wrong order (for example when an instruction speculatively assigned to a particular time-step on one cpu writes to a cache line which has already been read from by an instruction speculatively assigned to a later time-step by another cpu) this must be detected at which point the local replica has to roll-back and then roll forwards more carefully to get the serialisation right. I think this would work better when the replicas are separated geographically because there is no BFT protocol overhead for inter-cpu communication. In both cases, I was going to emulate CPUs rather than try to use the real CPU or any hardware features. With emulation, I wanted to try out using a high emulated:real CPU ratio and/or deep pipelines to ameliorate the inherent stall problem of the first approach. Emulation seems to be a requirement for investigating the second approach. I think there's also a possibility of using dynamic translation to make either of these approaches go faster. With dynamic translation, either approach might get to within a small factor of the theoretical limit for BFT. Not good for scientific performance computing but the BFT, global replication, massive parallelism and single system image features would be great for a lot of business applications. I'm very curious, do you have another approach? Any thoughts about real-time? Harry. -- Harry Butterworth ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt