From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Dong, Eddie" <eddie.dong@intel.com>
Cc: Hongyang Yang <yanghy@cn.fujitsu.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
FNST-Gui Jianfeng <GuiJianfeng@cn.fujitsu.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Wen Congyang <wency@cn.fujitsu.com>
Subject: Re: [Qemu-devel] [RFC] COLO HA Project proposal
Date: Fri, 4 Jul 2014 09:35:46 +0100 [thread overview]
Message-ID: <20140704083546.GC2425@work-vm> (raw)
In-Reply-To: <A12AC9D104E08D47BAF23C492F83C53B25883B3B@SHSMSX104.ccr.corp.intel.com>
* Dong, Eddie (eddie.dong@intel.com) wrote:
> > >
> > > I didn't quite understand a couple of things though, perhaps you can
> > > explain:
> > > 1) If we ignore the TCP sequence number problem, in an SMP machine
> > > don't we get other randomnesses - e.g. which core completes something
> > > first, or who wins a lock contention, so the output stream might not
> > > be identical - so do those normal bits of randomness cause the
> > > machines to flag as out-of-sync?
> >
> > It's about COLO agent, CCing Congyang, he can give the detailed
> > explanation.
> >
>
> Let me clarify on this issue. COLO didn't ignore the TCP sequence number, but uses a
> new implementation to make the sequence number to be best effort identical
> between the primary VM (PVM) and secondary VM (SVM). Likely, VMM has to synchronize
> the emulation of randomization number generation mechanism between the
> PVM and SVM, like the lock-stepping mechanism does.
>
> Further mnore, for long TCP connection, we can rely on the (on-demand) VM checkpoint to get the
> identical Sequence number both in PVM and SVM.
That wasn't really my question; I was worrying about other forms of randomness,
such as winners of lock contention, and other SMP non-determinisms,
and I'm also worried by what proportion of time the system can't recover
from a failure due to being unable to distinguish an SVM failure from
a randomness issue.
Dave
>
>
> Thanks, Eddie
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
WARNING: multiple messages have this Message-ID (diff)
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Dong, Eddie" <eddie.dong@intel.com>
Cc: FNST-Gui Jianfeng <GuiJianfeng@cn.fujitsu.com>,
Hongyang Yang <yanghy@cn.fujitsu.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [Qemu-devel] [RFC] COLO HA Project proposal
Date: Fri, 4 Jul 2014 09:35:46 +0100 [thread overview]
Message-ID: <20140704083546.GC2425@work-vm> (raw)
In-Reply-To: <A12AC9D104E08D47BAF23C492F83C53B25883B3B@SHSMSX104.ccr.corp.intel.com>
* Dong, Eddie (eddie.dong@intel.com) wrote:
> > >
> > > I didn't quite understand a couple of things though, perhaps you can
> > > explain:
> > > 1) If we ignore the TCP sequence number problem, in an SMP machine
> > > don't we get other randomnesses - e.g. which core completes something
> > > first, or who wins a lock contention, so the output stream might not
> > > be identical - so do those normal bits of randomness cause the
> > > machines to flag as out-of-sync?
> >
> > It's about COLO agent, CCing Congyang, he can give the detailed
> > explanation.
> >
>
> Let me clarify on this issue. COLO didn't ignore the TCP sequence number, but uses a
> new implementation to make the sequence number to be best effort identical
> between the primary VM (PVM) and secondary VM (SVM). Likely, VMM has to synchronize
> the emulation of randomization number generation mechanism between the
> PVM and SVM, like the lock-stepping mechanism does.
>
> Further mnore, for long TCP connection, we can rely on the (on-demand) VM checkpoint to get the
> identical Sequence number both in PVM and SVM.
That wasn't really my question; I was worrying about other forms of randomness,
such as winners of lock contention, and other SMP non-determinisms,
and I'm also worried by what proportion of time the system can't recover
from a failure due to being unable to distinguish an SVM failure from
a randomness issue.
Dave
>
>
> Thanks, Eddie
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2014-07-04 8:35 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-24 2:08 [RFC] COLO HA Project proposal Hongyang Yang
2014-06-24 2:08 ` [Qemu-devel] " Hongyang Yang
2014-07-01 12:12 ` Dr. David Alan Gilbert
2014-07-01 12:12 ` Dr. David Alan Gilbert
2014-07-03 3:42 ` Hongyang Yang
2014-07-03 3:42 ` Hongyang Yang
2014-07-04 8:31 ` Dong, Eddie
2014-07-04 8:31 ` Dong, Eddie
2014-07-04 8:35 ` Dr. David Alan Gilbert [this message]
2014-07-04 8:35 ` Dr. David Alan Gilbert
2014-07-04 8:54 ` Dong, Eddie
2014-07-04 8:54 ` Dong, Eddie
2014-07-04 12:22 ` Dr. David Alan Gilbert
2014-07-04 12:22 ` Dr. David Alan Gilbert
2014-07-04 15:55 ` Dong, Eddie
2014-07-04 15:55 ` Dong, Eddie
2014-07-08 6:06 ` Michael R. Hines
2014-07-08 6:26 ` Hongyang Yang
2014-07-08 6:26 ` Hongyang Yang
2014-07-04 11:22 ` Andreas Färber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140704083546.GC2425@work-vm \
--to=dgilbert@redhat.com \
--cc=GuiJianfeng@cn.fujitsu.com \
--cc=eddie.dong@intel.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=wency@cn.fujitsu.com \
--cc=yanghy@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.