From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:46483) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TCBy1-0008TE-1v for qemu-devel@nongnu.org; Thu, 13 Sep 2012 12:08:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TCBxv-0004dn-2Y for qemu-devel@nongnu.org; Thu, 13 Sep 2012 12:08:36 -0400 Received: from mx1.redhat.com ([209.132.183.28]:61845) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TCBxu-0004cg-Nt for qemu-devel@nongnu.org; Thu, 13 Sep 2012 12:08:30 -0400 Message-ID: <505204F4.3040300@redhat.com> Date: Thu, 13 Sep 2012 19:08:20 +0300 From: Avi Kivity MIME-Version: 1.0 References: <87pq5r5otp.fsf@codemonkey.ws> <20120912151549.GT20907@redhat.com> <87y5kfrtne.fsf@codemonkey.ws> <20120913104940.GA20907@redhat.com> <5051DC20.4090204@redhat.com> <20120913132804.GO7767@redhat.com> <87r4q6xbiy.fsf@codemonkey.ws> <20120913142228.GK20907@redhat.com> <87boha7zyx.fsf@codemonkey.ws> <20120913144811.GL20907@redhat.com> <87ehm5or07.fsf@codemonkey.ws> In-Reply-To: <87ehm5or07.fsf@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Rethinking missed tick catchup List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Gleb Natapov , Jan Kiszka , Michael Roth , qemu-devel@nongnu.org, Paolo Bonzini , Luiz Capitulino , Eric Blake On 09/13/2012 06:56 PM, Anthony Liguori wrote: >>> >> Hmm, true. What about hooking into suspend and doing vmstop during >> suspend. > > Is suspend the only foreseeable way for this problem to happen? I don't > think it is which is what concerns me about any approach that relies on > "hooking suspend". No, SIGSTOP/SIGCONT (can hook SIGCONT), gdb (can't hook but is very rare), ENOSPACE + wait for more space to be provisioned (already known to qemu), NFS access qemu core on dead server, severe swapstorms. > Also, I don't think there is a generic way to "hook suspend". That is what we have Lennart for. >>> >> This could happen because of stop, host suspend, live migration to a >>> >> file, etc. >>> >> >>> >> It's much easier for us to call into qemu-ga to do the time correction >>> >> whenever this event occurs than to try and have libvirt figure out when >>> >> it's necessary. >>> > And if guest does not have qemu-ga what is better inject interrupts like >>> > crazy for next 2 minutes or leave guest with incorrect time? >>> >>> Yes, at least that's fixable by the end-user. QEMU consuming 100% CPU >>> for a prolonged period of time isn't fixable. >>> >> You mean yes to "leave guest with incorrect time"? QEMU will still >> consume 100% of cpu for some time calling qemu_timer callback millions >> times. timedrift code is not the right level to fix that. > > Not if we put a cap on how many interrupts we'll try to catch up. > > As I mentioned previously, if we acrue more than X number of missed > ticks, we should simply declare bankruptcy and reset the counter. If we know we're missing N ticks, we can simply pass N to the handler. > > When that occurs, *if* qemu-ga is present, we should ask qemu-ga to > reset the guest's clock based on reading the hardware clock via a > 'guest-resync-time' command. > > If it isn't, time will be off. Hopefully the guest is running NTP and > can correct itself. Otherwise, at least the admin can manually fix the > time. There is also the fake S3 (post host resume) that can get the guest to read its RTC. -- error compiling committee.c: too many arguments to function